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Abstract 

Kalman  filtering  and  multiple  model  adaptive  estimation  (MMAE)  methods 
have  been  applied  by  researchers  in  several  engineering  disciplines  to  a  multitude  of 
problems  as  diverse  as  aircraft  flight  control  and  drug  infusion  monitoring.  MMAE 
methods  have  been  used  to  adapt  to  an  uncertain  noise  environment  and/or  identify 
important  system  parameters  in  these  problems.  All  of  the  model-based  estimation 
(and  control)  problems  considered  in  this  earlier  research  have  at  their  core  a  linear 
(or  mildly  nonlinear)  model  based  on  finite-dimensional  differential  (or  difference) 
equations  perturbed  by  random  inputs  (noise).  However,  many  real-world  systems 
are  more  naturally  modelled  using  an  infinite-dimensional  continuous-time  linear 
systems  model,  such  as  those  most  naturally  modelled  as  partial  differential  equations 
or  time-delayed  differential  equations.  Thus,  we  are  motivated  to  extend  existing 
finite-dimensional  techniques,  such  as  the  Kalman  filter,  to  allow  the  engineer  to 
apply  familiar  tools  to  a  larger  class  of  problems. 

The  focus  of  this  research  is  (1)  to  extend  the  Kalman  filtering  technique 
to  encompass  infinite-dimensional  continuous-time  systems  with  sampled-data  mea¬ 
surements  and  (2)  to  approximate  the  infinite-dimensional  continuous-time  system 
model  descriptions  with  an  essentially  equivalent  finite- dimensional  discrete-time 
model  upon  which  a  filtering  algorithm  could  be  based. 

The  infinite-dimensional  sampled-data  Kalman  filter  (ISKF)  is  a  mathemat¬ 
ical  extension  of  the  finite-dimensional  sampled-data  Kalman  filter.  The  ISKF  is 
rigorously  developed  using  the  definition-theorem-proof  format.  First,  we  derive  the 
linear  infinite- dimensional  minimum  variance  unbiased  estimator  (LIMVUE)  based 
on  a  dynamics  model  driven  by  a  Wiener  process  (Brownian  motion)  and  based  on 
the  Classical  Projection  Theorem  to  handle  the  state  estimator’s  measurement  up¬ 
date  cycle.  Then  we  create  an  equivalent  discrete-time  model  description  based  on 
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the  problem’s  natural  continuous-time  model  to  provide  a  means  to  propagate  the 
state  estimator  between  measurement  updates. 

Next,  the  algorithm  to  create  an  essentially  equivalent  finite-dimensional 
discrete-time  model  from  an  infinite-dimensional  continuous-time  model  is  con¬ 
structed  by  combining  an  existing  technique  for  producing  an  equivalent  discrete¬ 
time  model  for  a  finite-dimensional  system  and  a  novel  Galerkin-like  technique  for  a 
stochastic  differential  equation  that  completely  captures  the  important  qualities  of 
the  original  infinite-dimensional  description. 

An  extended  example  featuring  these  new  tools  is  presented  for  a  stochastic 
partial  differential  equation.  Specifically,  the  temperature  profile  along  a  slender 
rod  is  estimated  using  a  Kalman  filter  for  the  case  of  a  one-dimensional  stochastic 
heat  equation  with  Neumann  boundary  conditions.  Additionally,  the  MMAE  with  a 
bank  of  Kalman  filters  is  used  to  estimate  the  heat  profile  in  the  face  of  an  unknown 
noise  environment  (zero-mean  white  Gaussian  noises  with  uncertain  covariances  in 
the  dynamics  and/or  measurement  models)  and  to  perform  system  identification 
(to  determine  the  thermal  diffusivity  constant)  in  the  face  of  an  unknown  noise 
environment. 
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SAMPLED-DATA  KALMAN  FILTERING  AND 
MULTIPLE  MODEL  ADAPTIVE  ESTIMATION  FOR 
INFINITE-DIMENSIONAL  CONTINUOUS-TIME  SYSTEMS 


I.  Introduction 

In  the  1960’s,  Kalman,  Bucy,  and  Falb  [95,  96,  51]  devised  what  we  shall 
call  the  (sampled-data  measurement)  Kalman  filter,  the  (continuous-time  measure¬ 
ment)  Kalman-Bucy  filter,  and  the  infinite-dimensional  (continuous-time  measure¬ 
ment)  Kalman-Bucy  filter  (IKBF),  respectively.  Shortly  after  the  finite-dimensional 
filters  were  put  forth,  Magill  [125]  introduced  a  nonlinear  technique  to  address  the 
case  of  uncertain  model  parameters  using  a  bank  of  Kalman  filters.  This  nonlinear 
technique  is  now  known  as  multiple  model  adaptive  estimation  (MMAE).  Kalman 
filtering  and  multiple  model  methods  used  to  adapt  to  an  uncertain  noise  environ¬ 
ment  and/or  identify  important  system  parameters  have  been  applied  to  dozens  of 
problems  in  many  engineering  disciplines;  several  examples  of  these  applications  and 
the  MMAE  theory  in  general  are  presented  to  the  reader  in  Chapter  II.  All  of  the 
model-based  estimation  (and  control)  problems  considered  in  the  research  discussed 
in  Chapter  II  have  at  their  core  a  linear  (or  mildly  nonlinear)  model  based  on  a  sys¬ 
tem  of  finite-dimensional  differential  (or  difference)  equations  perturbed  by  random 
inputs  (noise).  In  contrast  to  finite-dimensional  or  lumped-parameter  system  (LPS) 
theory  in  which  the  spatial  behavior  of  the  system  is  concentrated  at  a  single  point  in 
space,  the  field  of  infinite-dimensional  or  distributed-parameter  system  (DPS)  theory 
is  concerned  with  the  dynamic  behavior  of  processes  distributed  in  space  as  well  as 
evolving  in  time.  While  continuous-time  systems  of  the  lumped  parameter  variety 
are  adequately  modeled  with  systems  of  ordinary  differential  equations,  many  real- 
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world  systems  are  more  naturally  modeled  using  an  infinite- dimensional  DPS  model 
such  as  a  partial  differential  equation  (PDE),  like  the  heat  equation  given  by 

d  d 

ftXit’p)  =  fyX-tt'P)  +  u(^p)  (L1) 

where  the  temperature,  x(£,  p),  is  called  the  state  and  u (t,  p)  is  some  control  input, 
or  by  a  time-delayed  differential  equation  (TDE),  such  as 

^x(t)  =  F i(t)x(t)  +  F2(t)x(i  -  r)  +  u i(t)  +  u2 (t  -  r)  (1.2) 

where  the  state  and  control  input  are  partitioned  into  current  time  and  portions 
delayed  by  amount  r  and  F?  (t)  for  j  =  1,2  represents  the  system  dynamics.  In 
Chapter  III,  we  shall  generalize  these  equations  for  a  stochastic  state  with  random 
additive  disturbances. 

Two  good  places  for  an  engineer  to  begin  an  investigation  of  DPSs  are  the  com¬ 
pilations  edited  by  Ray  and  Lainiotis  [163]  in  1978  that  consists  of  broad  chapter- 
long  surveys  written  by  DPS  experts  that  fully  cover  the  entire  scope  of  DPS  the¬ 
ory  at  that  time,  and  the  two-volume  collection  of  benchmark  papers  put  together 
by  Stavroulakis  [184,  185]  a  few  years  later.  The  notes  by  Curtain  and  Pritchard 
[38]  presents  a  solid  continuous-time  system  mathematical  foundation  for  infinite¬ 
dimensional  linear  system  theory.  Note  that,  generally  speaking,  there  are  two  main 
camps  of  researchers  at  work  in  this  field:  those  concerned  with  the  practical  im¬ 
plementation  of  a  solution  for  an  application  most  often  refer  to  the  field  as  DPS 
theory,  while  those  more  interested  in  the  theoretical  or  mathematical  foundations 
talk  of  infinite-dimensional  linear  systems  theory.  However,  the  terms  are  most  often 
treated  as  synonyms  in  the  literature,  as  they  are  in  this  research. 

We  are  interested  in  both  areas  of  research,  hence  the  focus  of  this  research 
is  twofold.  On  the  mathematical  foundations  side,  we  begin  by  extending  the 
Kalman  filtering  technique  to  encompass  infinite-dimensional  sampled-data  mea¬ 
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surement  systems:  that  is,  we  will  derive,  in  Chapter  III,  the  infinite-dimensional 
sampled-data  Kalman  filter  (ISKF),  thus  completing  the  mathematical  quartet  of 
filters  begun  over  forty  years  ago.  Then  we  give  a  method  for  mapping  an  infinite¬ 
dimensional  continuous-time  model  to  an  equivalent  infinite- dimensional  discrete¬ 
time  model,  so  that  the  new  ISKF  can  be  used  for  both  continuous-time  and  discrete¬ 
time  models.  On  the  practical  side,  we  use  a  subspace  spectral  method,  in  the 
spirit  of  the  Galerkin  technique  [62]  for  stochastic  differential  equations,  to  create 
an  essentially-equivalent  finite-dimensional  discrete-time  model  from  the  equivalent 
infinite- dimensional  discrete-time  model;  this  approximation  completely  captures  the 
important  qualities  of  the  original  infinite-dimensional  description.  Thus,  we  have 
crafted  a  new  method  for  transforming  a  DPS  problem  into  an  LPS  problem  that 
can  be  solved  using  existing  tools  and  techniques. 

1.1  Overview 

The  primary  purpose  of  this  research  is  to  extend  the  applicability  of  Kalman 
filtering  and  MMAE  to  problems  well-modeled  using  infinite-dimensional  continuous¬ 
time  linear  systems  with  sampled-data  measurements1.  In  this  research,  we  derive 
the  ISKF  algorithm  —  this  is  accomplished  in  Chapter  III.  The  ISKF  can  be  ap¬ 
plied  to  a  large  class  of  DPS  problems  modeled  by  an  infinite-dimensional  stochastic 
differential  equation. 

In  Figure  1.1  we  have  captured  the  primary  solution  paths  taken  by  other 
researchers  in  the  top  two  paths  and  this  research  in  the  bottom  path  to  solve 
infinite- dimensional  problems  using  finite- dimensional  tools  (J-\ ,  JF2,  T0 pt).  From  left 
to  right,  the  path  begins  with  a  projection  of  the  “truth”  onto  an  infinite-dimensional 
continuous-time  system  and  discrete-time  (sampled-data)  measurement  model.  The 
top  two  paths  conceptually  represent  existing  suboptimal  methods  used  to  map 
the  infinite-dimensional  continuous-time  model  to  a  finite-dimensional  discrete-time 

1  Infinite-dimensional  discrete- time  linear  systems  with  discrete-time  measurements  are  also 
covered. 
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Figure  1.1  Mapping  the  Infinite-Dimensional  Continuous-Time  Model  to  Finite- 
Dimensional  Discrete-Time  Models. 

model  so  that  a  digital  filtering  algorithm  can  be  used  to  estimate  the  state  and/or 
parameter  of  interest  associated  with  the  approximate  model.  The  top  path  could 
represent  an  infinite-dimensional  system  modeled  by  a  partial  differential  equation 
with  stochastic  (and  perhaps  deterministic)  inputs.  Then,  represents  the  process 
of  approximating  the  spatial  (and  thus  the  notation  dfi)  partial  derivatives,  thus 
reducing  the  dimension  of  the  model  to  some  finite  number.  Finite  elements  [89,  30] 
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and  finite  differences  [44,  111]  are  common  approximation  methods.  A  discrete- 
time  model  (that  is  perhaps  equivalent  in  the  same  sense  as  in  our  path  as  explained 
below)  could  be  found  next,  and  thus  7]  could  be  Topt  or  any  other  ad  hoc  technique. 
The  middle  path  could  be  used  to  demonstrate  the  time-delay  problem,  modeled  by 
a  stochastic  retarded  differential  equation.  The  measurement  time  delay  issue  could 
be  approximated  by  7 2.  Then  would  be  an  identity  operation  as  it  is  not  needed 
for  this  problem. 

The  bottom  path  is  fully  developed  in  Chapter  IV  where  we  demonstrate 
our  technique  for  creating  an  equivalent  infinite- dimensional  discrete- time  model 
(denoted  by  Topt  on  Figure  1.1)  from  the  original  infinite-dimensional  continuous¬ 
time  model.  The  model  is  termed  equivalent2  because  the  (infinite-dimensional) 
state3  is  identical  in  the  continuous-time  and  discrete-time  models  at  an  arbitrary 
time  instant.  Thus  an  equivalent  discrete-time  model  properly  characterizes  the 
continuous-time  dynamics  model  of  the  system.  Next,  we  demonstrate  how  to  cre¬ 
ate  an  essentially- equivalent  finite-dimensional  discrete-time  model  by  projecting  the 
infinite- dimensional  model  onto  a  finite-dimensional  subspace  —  denoted  by  <Sopt.  We 
need  a  finite-dimensional  subspace  so  that  we  can  use  a  digital  computer  to  imple¬ 
ment  an  algorithm.  How  the  projection  is  undertaken  is  very  important  because  the 
remaining  dimensions  of  the  infinite-dimensional  model  should  be  those  “directions” 
of  the  vector  space  which  are  dominated  by  the  noise  inputs  (i.e.,  the  uncertainties  in 


2When  we  have  a  continuous-time  dynamics  model,  we  follow  Maybeck  [129]  and  create  an 
equivalent  discrete-time  model  prior  to  designing  a  digital  filter  to  process  the  data  optimally.  A 
less  desirable  method  would  be  to  design  a  continuous-time  filter  matched  to  the  continuous-time 
model  and  then  discretize  the  filter  to  allow  computation  on  a  digital  computer.  This  second 
method  involves  numerically  solving  a  Riccati  equation  —  this  should  be  avoided  if  at  all  possible 
because  these  solutions  are  often  unstable  (despite  theoretical  solution  stability) ;  hence  the  solution 
might  not  converge.  Moreover,  the  discretized  version  of  an  optimal  continuous-time-measurement 
algorithm  is  not  guaranteed  to  be  optimal  and  generally  is  not. 

3For  finite  dimensional  systems,  the  state  is  the  set  of  numbers  (that  may  change  as  time 
progresses)  used  to  describe  the  system;  the  state  and  the  inputs  to  the  system  determine  the 
behavior  of  the  system  [129].  More  generally,  the  state  is  an  element  in  the  smallest  dimensional 
vector  space  that  fully  describes  the  system;  and  with  knowledge  of  the  inputs  (which  includes  the 
noises  and  uncertainties),  determines  the  behavior  of  the  system. 
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the  system  model,  measurement  inaccuracies,  and  other  disturbances)  and  thus  these 
excised  dimensions  are  of  little  value  to  the  engineer.  So,  by  essentially-equivalent  we 
mean  that  the  most  essential  subspace  of  the  infinite-dimensional  space  is  retained  in 
the  model.  Additionally,  unlike  <Si,  there  is  no  finite  element  approximation  to  spa¬ 
tial  differentiation,  but  instead  a  projection  using  a  finite  number  of  basis  functions 
(versus  the  infinite  number  needed  to  describe  the  original  function  completely). 

By  creating  a  finite- dimensional  model,  we  are  able  to  take  full  advantage  of  the 
existing  body  of  knowledge  concerning  digital  filtering  and  simulation  techniques  and 
software.  Finally,  we  use  the  discrete  Kalman  filter  (jFopt)  to  process  the  available 
data  optimally  to  determine  a  solution  recursively.  The  solution  space  might  also 
be  called  the  result  space  since  not  all  methods  are  guaranteed  to  produce  an  actual 
solution  —  our  process  is. 

1.2  Historical  Overview 

In  1960,  the  American  Society  of  Mechanical  Engineers  (ASME)  published 
Kalman’s  [95]  extraordinary  extension  of  the  Wiener  filter  [208];  shortly  thereafter, 
Kalman  and  Bucy  added  a  second  paper  tackling  the  more  mathematically  sophisti¬ 
cated  continuous-time  problem,  often  called  the  Kalman-Bucy  filter  [96].  Kalman’s 
filter  employed  a  new  approach  that  embraced  signals  of  finite  time  duration,  time- 
varying  system  models,  and  nonstationary  noise  processes  [129]  in  the  language  of  the 
new  state  space  (time-domain)  formulation  [217,  97,  164]  versus  frequency-domain 
methods  required  by  most  Wiener  filter  techniques  [208,  207,  129].  Kalman’s  work 
has  been  republished  in  many  collections,  such  as  [183,  12].  For  a  control  theory 
point  of  view,  see  any  of  these  excellent  texts:  [91,  141,  3,  129,  130,  131]  or  these 
signal  processing  texts:  [170,  100,  77],  or  these  newer  texts  that  concentrate  on  utiliz¬ 
ing  the  Kalman  and  related  hlters  for  tracking:  [190,  193,  14].  Additionally,  Kalman 
filtering  can  be  viewed  much  as  Kalman  did  in  [95]  within  the  mathematical  frame- 
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work  of  linear  algebra  and  functional  analysis  [122,  33],  while  Kalman-Bucy  filtering 
[96,  32]  is  an  especially  important  application  of  stochastic  calculus  [104,  159,  66]. 

The  seeds  of  the  multiple  model  technique  were  planted  by  researchers  seeking 
to  extend  Kalman’s  work  to  the  case  of  uncertain  Gaussian  noise  process  strength 
using  a  bank  of  parallel  Kalman  filters.  Magill  [125]  was  the  first  to  publish  such  an 
extension  using  a  structured  multiple  model  approach.  Specifically,  he  investigated 
optimal  estimation  of  stochastic  processes  which  can  be  well  modeled  by  a  sampled 
Gauss-Markov  process  with  some  initially  unknown  yet  deterministic  parameters4; 
in  Magills’s  research,  the  uncertain  parameters  affected  the  statistics  of  the  zero- 
mean  white  Gaussian  dynamics  driving  noise.  State  estimation  in  this  case  is  an 
adaptive  estimation  process  since  the  estimator  must  adapt  itself  to  an  unknown 
noise  environment  [59].  The  system  chooses  the  “correct”  parameter  value  from  a 
discrete  a  priori  set  based  on  the  hypothesis  conditional  probability  calculations. 
Even  though  a  parameter  may  vary  over  a  continuous  range  of  values,  the  MMAE 
method  fundamentally  assumes  that  the  true  parameter  can  be  found  in  a  discrete 
set.  Restricting  ourselves  to  a  discrete  set  actually  improves  the  performance  of 
the  MMAE  since  top  performance  occurs  when  the  distinguishability  between  the 
elemental  filters  is  high.  When  the  hlters  “look”  the  same,  the  MMAE  cannot 
readily  “decide”  which  model  coded  in  a  particular  elemental  filter  best  matches  the 
true  scenario  as  observed  in  the  measurements;  consequently  the  probability  flow 
is  hampered  —  this  topic  is  discussed  in  considerable  depth  in  the  next  chapter  in 
Sections  2.3  and  2.7.3. 


4What  is  a  parameter?  A  parameter  is  usually  constant  (or  essentially  constant  over  the  time 
period  of  interest)  in  the  form  of  a  scalar,  vector,  or  matrix  that  relates  two  (or  more)  quantities 
(states  or  signals)  that  vary  with  time  (or  in  space).  A  simple  deterministic  parameter  is  the 
assumed-constant  mass  in  Newton’s  equation  relating  the  force  on  a  body  undergoing  acceleration: 
F(t)  =  ma(t ),  where  F(t)  is  the  time- varying  force,  a(t)  is  the  time- varying  acceleration,  and  the 
mass  to  is  the  parameter  relating  the  two  time  varying  “signals.”  If  the  parameter  is  not  strictly 
constant,  but  varies  slowly  in  comparison  to  the  other  quantities,  we  could  represent  the  parameter 
by  a  set  of  values.  For  example,  let  the  set  of  masses  be  expressed  as  a  piece-wise  constant  function 
of  time. 
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We  continue  our  introduction  to  this  research  with  a  simple  physically  meaning¬ 
ful  example  that  motivates  the  use  of  a  Kalman  filter  and  the  MMAE  methodology. 
The  multiple  model  method  effectively  partitions  a  complex  problem  into  adaptive 
and  nonadaptive  parts.  The  adaptive  part  is  the  MMAE  framework  that  blends  the 
estimates  of  several  filters  together  to  produce  excellent  results  —  usually  superior  to 
that  of  a  single  filter.  The  nonadaptive  portion  is  comprised  of  the  elemental  filters 
themselves.  While  these  filters  could  be  made  adaptive,  this  will  not  be  pursued  in 
this  research. 

1.3  Example:  Navigation 

To  help  the  reader  envision  how  an  MMAE  scheme  improves  state  estimation 
when  there  are  uncertainties  in  a  subset  of  the  parameters  describing  the  system 
dynamics/measurement  model  or  the  statistics  characterizing  the  dynamics  driving 
noise  or  measurement-corruption  noise,  a  simple  example  that  closely  resembles  and 
then  extends  the  “lost  at  sea”  example  by  Maybeck  [129]  is  developed  in  Subsections 
1.3.1  and  1.3.2  respectively. 

1.3.1  Kalman  Filter  with  Known  Noise  Strengths.  Suppose  that  you  and 
a  friend  are  sailing  north  from  Port  A  to  Port  B.  We  can  use  a  chronograph  to  com¬ 
pute  our  east-west  position  (longitude)  and  a  sextant  to  determine  our  north-south 
(latitude)  position.  For  simplicity,  we  assume  that  we  have  a  perfect  chronograph 
and  associated  charts  and  can  thus  exactly  calculate  our  longitude.  Thus  you  only 
need  to  find  the  scalar  latitude  position  to  navigate  from  Port  A  to  Port  B. 

The  one-dimensional  noise  that  corrupts  these  scalar  measurements  obtained 
with  a  sextant  by  sighting  the  stars  is  an  additive  noise  process.  While  you  have 
never  used  your  star  sighting  skills  to  navigate  the  sea,  your  friend  is  an  accom¬ 
plished  expert.  You  take  the  initial  star  sighting  to  compute  your  latitude;  let’s  call 
it  measurement  z(fi)  =  Zi  and  use  this  to  establish  your  position  at  time  t\.  The 
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Figure  1.2  First  Sextant  Sighting. 

precision  of  your  measurements  is  characterized  by  the  standard  deviation,  given  by 
aZl,  or  equivalently,  the  expected  variance,  aZ] ,  of  the  estimate.  With  this  informa¬ 
tion,  we  can  establish  the  conditional  probability  density  function  (PDF)  of  x^), 
for  the  estimated  position  at  time  ti,  conditioned  on  the  observed  measurement  z\. 
Thus,  we  have  two  statistical  moments  with  which  to  describe  the  conditional  PDF 
/x(ti)|z(ti)(£|zi)-  Furthermore,  if  we  assume  that  this  conditional  PDF  is  Gaussian, 
then  we  have  completely  characterized  the  PDF,  since  only  the  first  two  moments 
are  required  to  characterize  the  Gaussian  PDF  fully,  as  shown  in  Figure  1.2. 

Based  on  the  information  so  far,  our  best  position  (or  state)  estimate  is 

X{ti)  =  Z!  (1.3) 
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Figure  1.3  Second  Sextant  Sighting.  The  solid  black  line  is  for  the  second  sighting 
and  the  dash-dotted  gray  line  is  for  the  initial  sighting.  The  areas  under  the  two 
PDFs  are  equivalent.  The  peak  of  the  second  PDF  is  higher  since  its  width  (charac¬ 
terized  by  the  standard  deviation  <jZo  of  the  PDF)  is  smaller  than  the  width  of  the 
first  PDF. 

and  the  variance  of  the  estimate  error  is 


fth) 


(1.4) 


A  few  moments  later,  your  friend  takes  a  sighting  at  time  f2  —  t\ ,  and  we 
assume  that  the  true  position  has  not  changed  at  all  —  this  is  equivalent  to  taking 
the  two  measurements  at  the  same  time  with  two  identical  sextants.  Label  his 
measurement  £2  with  error  variance  a\  .  Note  that  the  difference  in  the  models  is 
confined  to  the  assessed  skill  of  the  sextant  operator.  Figure  1.3  shows  conditional 
PDFs  for  both  measurements. 
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Now  we  have  two  measurements  with  which  to  estimate  our  position.  How  do 
we  combine  them  to  yield  the  best  estimate?  If  we  didn’t  have  any  knowledge  about 
the  precision  of  the  measurements,  we  would  simply  average  the  measurements  to 
estimate  the  position  at  time  t2 


Hh)  =  yi  +  2^ 


(1.5) 


Observe  that  the  coefficients  sum  to  one.  Since  we  know  the  precision  of  each  mea¬ 
surement  —  in  terms  of  the  expected  variance  —  let’s  use  them.  A  clever  person 
might  propose  a  weighted  average  of  the  measurements  z\  and  z2  in  terms  of  the 
expected  variances  to  yield  a  position  estimate  at  time  t2  of  [17] 


x(t2) 


a 


Z  2 


o-'i  +  at 


~Zi  + 
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21 


cr?,  +  at 


-Z2 


(1.6) 


where  the  coefficients  yet  again  sum  to  one  and  the  expected  variance  of  the  position 
estimate  conditioned  on  the  accuracy  of  the  measurements  is 


1 

(1/4)  +  (VO 


(1.7) 


In  fact,  it  can  be  shown  that  Equations  (1.6)  and  (1.7)  correspond  to  the  conditional 
mean  and  error  covariance  of  a  Gaussian  PDF  conditioned  on  the  two  measurements 
[129].  Given  this  conditional  Gaussian  PDF,  this  estimate  is  optimal  under  many 
criteria,  such  as  the  minimum  mean-squared  error  —  see  Chapter  II  in  general  and 
Section  2.3.2  in  particular.  The  mean  is  simply  a  weighted  average  of  the  two  mea¬ 
surements  and  the  variance  is  reduced  by  adding  a  second  measurement  —  regardless 
of  the  precision  of  or  the  variance  associated  with  the  second  measurement,  assuming 
that  the  variance  is  not  infinite.  Curiously,  the  equation  for  the  new  variance  is  of 
the  same  structure  as  that  of  adding  two  resistors  in  parallel.  Next  we  substitute 
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Equation  (1.3)  into  Equation  (1.6)  and  rewrite  it  in  a  “predictor-corrector”  form 


x(t2)  =  x(t,)  +  K(t2)[z 2  -  i(t i)' 


(1.8) 


where  K(t2),  the  so-called  Kalman  gain,  which  is  “chosen”  to  minimize  the  mean- 
squared  error  between  the  position  estimate  and  the  true  position,  is  given  by 


K(t2) 


az 


al  +  a 


Zi 


Z2 


(1.9) 


Similarly,  the  variance  in  Equation  (1.7)  can  be  rewritten  using  Equation  (1.9)  with 
Equation  (1.4)  as 

^h)  =  a2(t1)-K(t2)a2(t1)  (1.10) 


Thus,  the  Gaussian  conditional  PDF  f*(t2)\z(ti),z(t2)(£\zi,  ^2)  at  time  t2  is  completely 
specified  by  mean  /.1  =  x(t2)  and  variance  a2  =  cr2(t2)  shown  in  Equations  (1.8) 
and  (1.10)  respectively,  with  the  Kalman  filter  gain  defined  in  Equation  (1.9)  and 
displayed  in  Figure  1.4. 

Now  that  we  have  shown  how  to  update  the  initial  estimate  with  a  second 
measurement,  let’s  add  some  dynamics  to  the  problem  in  order  to  estimate  a  future 
position  before  we  take  a  third  measurement.  Basic  kinematics  enables  us  to  model 
your  change  in  position  as 

dx/dt  —  u  +  \N  (1.11) 


where  dx/dt  represents  the  rate  of  change  in  position  or  velocity,  u  is  some  nominal 
velocity,  and  the  uncertainty  in  the  velocity  due  to  unmodeled  effects  and  other 
disturbances  is  described  with  a  zero-mean  Gaussian  random  variable  w,  often  simply 
referred  to  as  “noise.”  This  noise  is  uncorrelated  in  time  (i.e. ,  it  is  a  white  process) 
and  has  strength  Q ;  the  strength  Q  of  the  noise  corresponds  to  the  amplitude  of  the 
power  spectral  density  (PSD)  curve  over  all  frequencies,  which  for  this  white  noise 
process  is  constant  for  all  time.  With  this  simple  model,  we  can  predict  the  position 
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Figure  1.4  First  and  Second  Sextant  Sightings  Combined.  Solid  black  line  is  the 
combination  of  the  two  sightings,  the  dashed  gray  line  is  for  the  second  sighting,  and 
the  dash-dotted  gray  line  is  for  the  first  sighting. 

at  time  t3  as 

x (£3  )  =  x(t2)  +  u[t3  -  t2]  (1.12) 

where  the  time  t$  is  a  notational  convenience  representing  our  prediction  at  time  f3 
before  a  new  measurement  is  added  and  the  variance  of  the  estimate  grows  to 

^3)  =  +Q[t3  -t2]  (1-13) 

With  this  construction,  we  have  in  effect  propagated  the  Gaussian  conditional  PDF 
from  time  G  until  time  t3,  which  we  represent  as  t^  in  order  to  keep  track  of  our 
propagation  and  update  stages.  For  the  linear  dynamics  model  in  Equation  (1.11) 
driven  by  known  inputs  u  and  white  Gaussian  noise  w,  the  PDF  will  continue  to  be 
Gaussian  [129]. 
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Next,  we  add  a  third  measurement  taken  by  your  friend  and  update  the  position 
estimate.  This  update  will  complete  the  second  step  of  this  two-stage  process  that 
consists  of  propagating  the  optimal  estimate  forward  in  time  using  the  dynamics 
model  and  then  updating  the  estimate  with  a  new  measurement.  Note  that  the 
variance  of  our  estimate  grows  during  the  propagation  stage  for  this  problem  because 
of  the  uncertainty  in  our  initial  position  estimate  x{t\)  and  the  uncertainty  in  how 
our  position  changes  over  time  (due  to  the  w).  On  the  other  hand,  the  variance  is 
reduced  during  the  update  stage  because  we  have  added  new  information  to  refine 
the  position  estimate.  Using  Equations  (1.8)  and  (1.10)  with  the  Kalman  filter  gain 
defined  in  Equation  (1.9)  as  our  template,  we  write  the  optimal  position  prediction 
at  time  t3  as 


x(t3)  =  x(t3  )  +  K(t3)[z3  -  x(t3  )] 

(1.14) 

with  variance 

°l{t3)  =  ol(t3)^K{t3)ol(t3) 

(1.15) 

where  the  Kalman  gain  is 

<7x(*3)  +az3 

(1.16) 

Hence,  the  optimal  estimate  of  the  position  x  at  time  t3  is  the  optimal  predicted 
position  just  before  the  latest  measurement,  x(t3),  plus  a  correction  term  based  on 
the  weighted  residual  between  the  new  measurement  and  the  measurement  predic¬ 
tion  K(t3)[z3  —  x(t3  )],  where  the  Kalman  gain  K(t3)  provides  the  weighting  and 
[z3  —  x(tj)]  is  called  the  residual.  The  weighting  represents  our  relative  confidence 
in  the  measurements  and  predicted  position  estimates.  We  can  continue  this  process 
of  propagating  and  updating  indefinitely  until  our  objective  (reaching  Port  B)  is 
achieved.  If  our  sighting  variances  a2. ,  for  each  measurement  are  accurate,  then  we 
can  expect  good  results  from  this  single  Kalman  filter.  However,  if  there  is  some  un¬ 
certainty  in  the  declared  statistics  (and/or  models),  then  the  performance  generally 
degrades  from  what  we  would  otherwise  expect. 
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1.3.2  Multiple  Model  Adaptive  Estimation.  Now  that  we  have  shown  how  a 
Kalman  filter  can  assist  our  navigation  solution  by  improving  our  latitude  estimate, 
we  shall  consider  the  case  for  which  there  is  an  unpredictable  uncertainty  with  the 
quality  of  our  sextant  measurements.  Specifically,  we  will  consider  a  system  char¬ 
acterized  by  a  known  dynamics  noise  strength  and  two  possible  measurement  noise 
covariances  in  order  to  motivate  the  utility  of  a  multiple  model  estimator.  For  our 
example,  these  two  “choices”  represent  the  measurement-corruption  variances  for  a 
properly  calibrated  sextant  and  another  for  an  uncalibrated  sextant.  This  problem  is 
similar  to  the  problem  that  Magill  [125]  studied,  except  that  he  did  not  limit  himself 
to  just  two  possibilities  and  he  was  concerned  with  an  uncertain  dynamics  driving 
noise. 

Once  again,  you  and  a  friend  are  sailing  from  Port  A  to  Port  B  on  a  clear  night. 
You  take  the  initial  sighting  using  the  calibrated  sextant  and  determine  your  latitude 
z(ti)  =  z\  at  time  t\  with  variance  a\x.  As  you  hand  the  sextant  to  your  friend  to  take 
the  remainder  of  the  measurements,  it  slips  from  your  grip  and  falls  to  the  deck.  You 
are  uncertain  if  this  fall  has  spoiled  the  calibration  of  your  instrument;  however,  you 
have  a  measurement  variance  recorded  in  your  log  book  for  uncalibrated  sextants. 
So,  you  are  faced  with  more  uncertainty  in  your  measurements... or  are  you?  The 
question  remains  whether  the  measurement  variance  for  the  subsequent  sightings  is 
equal  to  either  Rca \  or  i?uncai,  where  Rca\  <  RunCa i-  But  don’t  despair,  the  MMAE 
can  tell  you  whether  the  sextant  is  most  likely  calibrated  or  not  after  just  a  few  more 
measurements  and  it  can  give  you  the  best  latitude  estimate  possible.  The  MMAE 
technique  is  well  suited  to  give  the  proper  “advice.” 

In  this  simple  example,  we  shall  operate  two  Kalman  filters  F\  and  F2  in 
parallel  (see  Figure  1.5)  —  one  for  a  calibrated  sensor  and  one  for  an  uncalibrated 
sextant  —  each  processing  the  same  measurements,  z(£j)  =  ztl  and  each  propagating 
the  estimate  using  the  same  algorithm  developed  in  Section  1.3.1.  In  block  D  we 
determine  the  probability,  pj,  that  elemental  filter  Fj  is  the  best  modeled  filter.  The 
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Figure  1.5  Multiple  Model  Adaptive  Estimation:  Two  Kalman  Filters  in  Parallel 

E  block  processes  the  estimates  from  the  elemental  filters  and  creates  an  overall 
position  estimate. 

Let’s  begin  by  building  two  Liters  and  label  the  first  filter  “cal”  for  a  properly 
calibrated  sensor  and  the  second,  “uncal”  for  an  uncalibrated  sensor.  Recall  that 
you  performed  the  first  measurement  using  the  calibrated  sextant,  call  it  z(U)  =  z\ 
at  time  t\.  The  initial  position  estimate  is  the  same  for  both  the  “cal”  and  “uncal” 
Liters 

M  •^'cal(^l)  Uincal  (f  1 )  (LIT) 

with  the  same  initial  measurement- corruption  noise  covariance 

<  =  Rci  (1.18) 

since  it  was  taken  using  the  calibrated  sextant  before  it  was  dropped.  Using  Equa¬ 
tions  (1.12)  and  (1.13)  the  position  estimate  at  time  t,  is 

•Ual(^j  )  *^cal(U— l)  T  'a \t,j  U— l]  (1.19) 

Uirical (t^  )  *^uncal(U—  l)  T  ti—  l]  (1.20) 
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with  variances  for  the  calibrated  and  uncalibrated  models 


(ti  )  =  ^cai(^-i)  +  Q[ti-  ti- 1]  (1.21) 

^uncal^r )  =  ^uncal(^-l)  +  Q[U  ~  *i-l]  (1-22) 


and  the  dynamics-process  noise  strength ,  Q,  is  assumed  constant  for  all  time  ti. 

Next,  the  two  filters  are  updated  at  time  tn  using  the  latest  measurement 
z (ti)  =  Zi  following  the  form  of  Equations  (1.14)  through  (1.16)  we  obtain 


with  variances 


2-cal  (t?)  I'cal  {ti  )  T  I^cal(ti)\Zi  XCSL\  (tj  )] 

•^uncal  (ti )  ■1'unca.l  (t,  )  ~t~  unca]  (ti)  \%i  -^uncal (^j  )] 


^cal(^)  =  aL(ti  )  -  ^cal(^)  ^cal(^  ) 
'Kincaid)  —  *^uncal(^i  )  —  -^uncal(^i)  ^uncal^i  ) 


and  Kalman  gains  of 


-^■cal  (ti 
b  uncal  (tj: 


l-l 


=  O, 


cal  )  +  fleal] 

^"uncal(^j  )  [^uncal(^i  )  d"  -Runcal] 


-1 


(1.23) 

(1.24) 


(1.25) 

(1.26) 


(1.27) 

(1.28) 


where  the  measurement  precision  is  quantified  by  the  assumed  constant  variances 
of  the  corruption  noises  Rcai  and  /?uncai  for  all  time  t*.  The  difference  between  the 
predicted  measurement,  xcai{t~)  or  xuncai (t~ ) ,  and  the  actual  observation,  for  all 
time  ti,  is  termed  the  residual  in  each  of  the  two  elemental  filters.  The  residual, 
[zi  —  xca\(t^ )]  or  [zi  —  £uncai(t“)],  is  our  connection  to  the  “real”  world  in  Equations 
(1.23)  and  (1.24),  respectively. 
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Now  that  we  have  our  two  elemental  filters  set  up,  how  do  we  decide  which 
one  is  the  most  accurate  so  that  we  can  form  the  best  estimate  of  our  position?  In 
this  development,  we  have  implicitly  assumed  that  the  sextant  is  either  calibrated  or 
uncalibrated  and  that  one  of  the  filters  will  give  us  the  best  results.  The  probability, 
pr{-},  that  a  hypothesis  is  true  given  the  observations  is  termed  the  hypothesis 
conditional  probability.  The  hypothesis  conditional  probability  for  the  calibrated 
sextant,  pca\(ti),  and  uncalibrated  sextant,  puncai(b),  are  defined  as 

Pcai(^)  =  pr{R  =  i?cai|z(fi)  =  Zi,  z(t2)  =  z2,  ■  ■  ■ ,  z (U)  =  £;}  (1.29) 

Puncal(^)  =  Pr{R  =  i?uncal|z(tl)  =  Z1}z(t2)  =  Z2,  •  •  •  ,Z (tj)  =  Zi}  (1.30) 

such  that  Pcai(^),Puncai(^)  >  0,  and  they  sum  to  one:  pcai(^)  +  Puncai(^)  =  1  for 
every  time  t,t.  We  introduce  the  following  shorthand  notations  for  the  PDFs  for  the 
measurement  at  time  t%  conditioned  on  the  assumed  parameter  value  for  R  and  the 
sequence  of  observations  from  time  t\  through  time  i  —  this  is  also  known  simply 
as  the  conditional  PDF  for  the  incoming  measurement  and  is  given  here  as  [130] 

/cal(fi)  =  /z(U)|R,z(tl),.-..z(ii-l)  I -Real)  D  i  •  •  •  i  l)  (1.31) 

/uncal(fi)  =  /z(U)|  R,z(i i ) , . . . ,z(ii_ i )  (%i  \  ^uncab  ^l,  •  •  •  ,  Zi—  i)  (1.32) 

These  PDFs  are  a  function  of  the  residual,  [zi  —  xc&i(t^)\  or  [zi  —  £uncai(t“)], 
as  seen  in  Equations  (1.23)  and  (1.24),  and  the  filter-computed  residual  variance, 
[CTcal(^')+^cal]  or  [  °micai(^  )  +  b’lmcai] ,  which  appear  in  Equations  (1.27)  and  (1.28). 
Thus,  these  PDFs  contain  the  information  we  use  to  calculate  the  hypothesis  condi¬ 
tional  probabilities.  These  probabilities  indicate  how  well  the  hypothesized  models, 
R  =  -Real  for  the  calibrated  sextant  filter  model  and  R  =  -RunCai  f°r  the  uncalibrated 
sextant  filter  model,  match  the  real  world.  We  judge  the  quality  of  the  match  us¬ 
ing  the  sequence  of  residuals  created  from  the  predicted  measurements  xc&\[t~)  and 
xunca\(t~)  and  the  observed  measurements  z%  for  all  time  t%.  The  filter  model  that 
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best  matches  the  reality  will  produce  the  “whitest”  sequence  of  residuals5,  a  mean 
that  is  closest  to  zero,  and  a  residual  sequence  variance  that  is  most  in  consonance 
with  the  filter-computed  variance,  [ofal(t,“)  +  -RunCai]  or  [cr^ncal(^)  +  -Runcai]-  This 
will,  in  turn,  increase  the  value  of  the  measurement  conditional  PDF  evaluated  for 
a  measurement  at  time  U,  which  in  turn  leads  to  a  higher  hypothesis  conditional 
probability  that  this  particular  model  matches  the  real  world  the  best.  Additionally, 
the  measurement  uncertainties  are  visible  in  the  Kalman  gain  Equations  (1.27)  and 
(1.28).  Furthermore,  it  can  be  shown  that  these  measurement  conditional  PDFs  are 
Gaussian  with  a  mean  equal  to  the  predicted  state  estimate,  xca\{t~)  or  £Uncai(t~), 
and  a  known  variance  [ofal(t“)  +  Rc ai]  or  [a^ncal(t")  +  Runca J. 

The  measurement  conditional  PDFs  can  be  evaluated  for  a  particular  assumed 
R,  and  thus  /cai(^)  and  /uncai(^)  are  evaluated  as  numbers  once  z(i*)  =  becomes 
available.  Then,  the  probability  that  the  sextant  is  calibrated  or  uncalibrated  at 
time  U  >  t\  is6 


Peal  {ti  ) 


Puncal  (j'i 


f calif  i)  Peal ifi—\ 


fealfi)  Pcalifi—l)  "P  f uncalf  i)  Puncal(^i— l) 
f uncalf  i)  Puncal 

fealty  i)  Pcaltyi—l)  T  «/imcal(^i)  Puncal(^i— l) 


(1.33) 

(1.34) 


These  two  probabilities  are  nonnegative  and  the  denominator  in  Equations  (1.33) 
and  (1.34)  serves  to  scale  the  probabilities  so  that  they  sum  to  one.  For  the  first 
measurement,  we  assumed  that  the  sextant  was  calibrated;  thus  we  had  pcai(ti)  =  1 
and  puncai(£i)  =  0.  Using  Equations  (1.33)  and  (1.34),  we  can  calculate  the  hypothesis 


5A  sequence  of  white  residuals  is  one  of  the  criteria  that  indicates  that  the  assumed  filter  model 
is  in  consonance  with  the  real  world  or  with  the  truth  model  in  the  case  of  a  simulation  [129]  as 
discussed  in  Section  2. 3. 3. 3. 

6Recall  that  we  assumed  that  the  sextant  was  initially  calibrated;  thus  pcai(U)  =  1  and 

Runcai  (U)  —  0- 
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conditional  probabilities  at  time  t2  as 


Peal  (^2) 


Punca/^g) 


_ fcaljU)  •  1 _  _  fcaljU) 

fcaliU)  ■  1  +  /uncal(^)  '  0  /cal(^) 

/uncalfc)  •  0  =  0 

/cal(^)  '  1  +  /uncal(^)  ‘  0  /cal(^) 


(1.35) 

(1.36) 


Thus,  pCai (ti)  =  1  and  puncai(^)  =  0  for  all  time  U.  This  presents  a  problem  with  a 
known  remedy.  An  elemental  filter  is  virtually  removed  from  the  filter  bank  if  the 
hypothesis  conditional  probability  goes  to  zero.  By  inspection  of  Equations  (1.33) 
and  (1.34),  we  see  that  the  probability  at  the  next  instant  of  time  can  no  longer 
change  after  it  becomes  zero.  A  popular  method  used  to  counteract  the  “lock-out” 
problem  attributed  to  this  particular  method  used  to  compute  the  probabilities  using 
Equations  (1.33)  and  (1.34)  is  to  introduce  some  additional  logic  into  the  algorithm. 
One  technique  that  has  been  used  with  success  imposes  a  lower  bound7  on  the 
value  that  the  hypothesis  conditional  probabilities  may  assume,  thus  prohibiting  it 
from  being  driven  to  zero.  We  shall  choose  one  tenth  as  our  lower  bound  for  this 
example.  Thus  we  need  to  recompute  the  calibrated  and  uncalibrated  hypothesis 
conditional  probabilities  given  in  Equations  (1.35)  and  (1.36)  using  pcai(£i)  =  0.9 
and  pUnCai(^i)  =  o.l.  Hence, 


Peal  (*2) 


Puncal(^2) 


_ /cal(tj)  -0-9 _ 

Zealot)  •  0.9  +  /uncal^i)  ■  0.1 

_ /lineal (*,)  -0-1 _ 

/cal (ti)  ■  0.9  +  /uncal(^)  '  0.1 


(1.37) 

(1.38) 


If  Pcai (£2)  >0.1  and  /Pineal fe)  >0.1,  then  we  have  acceptable  probabilities  and  we 
are  done  with  this  time  step.  However,  if  either  inequality  fails  to  be  true,  then  we 
round  the  too-low  probability  up  to  0.1  and  round  the  other  probability  down  to  0.9. 
Note  that  this  lower  bounding  process  becomes  more  involved  when  we  have  more 
than  two  elemental  filters:  once  the  lower  bounds  are  imposed,  all  probabilities  must 

7See  Section  2.4.6  for  more  information  on  lower  bounding  the  hypothesis  conditional  probability. 
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be  rescaled  to  ensure  that  their  sum  is  one.  We  then  proceed  to  time  t3  in  the  same 
fashion. 

We  compute  the  hypothesis  conditional  probabilities  to  determine  whether  the 
sextant  is  either  calibrated  or  uncalibrated.  If  the  drop  did  not  affect  the  calibration 
of  the  sextant,  then,  it  is  likely  that  pcai(^)  >  Puncai(^)  and  the  best  position  estimate 
will  use  the  “cal”  filter  since  it  has  the  highest  hypothesis  conditional  probability. 
But  since  our  two  models  were  chosen  rather  arbitrarily  from  a  continuous  set  of  pos¬ 
sibilities,  a  blended  estimate  based  on  the  position  estimates  from  both  elemental 
filters  might  be  best  approach,  with  blending  weights  given  by  the  hypothesis  con¬ 
ditional  probabilities.  Note  that  the  parameter  estimate  will  fall  at  the  end  points 
or  in  between  the  values  used  in  the  models;  i.e.,  we  don’t  extrapolate  outside  of  the 
filter  bank.  Thus,  choosing  a  small  Rca \  value  for  a  tightly  calibrated  sextant  and  a 
high  i?micai  value  for  a  very  ■uncalibrated  sextant  might  produce  the  best  results. 

The  Bayesian  state  estimate  is  readily  computed  via  the  MMAE.  The  Bayesian 
estimate  represents  the  minimum  mean-squared  error  blending  of  information  pro¬ 
vided  by  the  MMAE  in  the  form  of  position  estimates  and  hypothesis  conditional 
probabilities  for  each  elemental  filter.  The  hypothesis  conditional  probabilities  are 
used  to  weight  each  position  estimate  as  shown 

•^MMAE^i)  ilcal(^i)  Pcal(tj)  T  Huncal(ti)  Puncal(tj)  (1.39) 

We  could  also  form  an  estimate  of  the  measurement  variance  based  on  these  two  fil¬ 
ters,  if  for  instance  the  sextant  was  slightly  damaged  and  thus  partially  uncalibrated, 
this  might  reveal  itself  in  the  measurement  residuals  and  thus 

^MMAe(Ii)  Acal  Peal  T  Anneal  7-tmcal  ( 1 .40) 

would  give  us  a  better  estimate  than  either  filter  by  itself. 
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In  this  section  we  have  shown  how  two  Kalman  filters  can  be  used  in  the 
MMAE  framework  to  supply  positional  estimates  needed  to  navigate  north  to  the 
Port  B  latitude  better  than  one  filter  when  the  precision  of  the  sensor  measurements 
is  in  doubt.  The  derivations  to  some  of  these  equations  will  be  undertaken  in  the 
next  chapter,  while  most  are  simply  stated  with  reference  to  where  the  reader  may 
find  the  derivation.  An  overview  of  the  advanced  topics  used  in  this  research  is  next. 

1-4  Advanced  Topics  Overview 

The  linear  systems  theory  state  space  approach  [217,  164,  155]  for  solving  sys¬ 
tems  of  linear  differential  equations  was  just  taking  root  when  Kalman  [95]  reported 
his  new  approach  for  linear  filtering  and  prediction  over  forty  years  ago.  These 
foundations  have  been  further  refined  and  extolled  by  many  researchers,  in  engi¬ 
neering  and  applied  mathematics,  studying  the  complementary  fields  of  mathemat¬ 
ical  optimization,  information  theory,  signal  processing,  estimation,  system  iden¬ 
tification,  and  control,  such  as  Kalman,  Bucy,  Falb,  Arbib,  Sorenson,  and  others 
[97,  183,  182,  6].  In  this  research,  we  will  be  mainly  interested  in  the  extensions 
of  linear  systems  filtering  theory  to  infinite  dimensions  [38,  18,  19,  39]8,  since  all 
real  physical  systems  are  truly  distributed  [163,  184,  185].  The  bulk  of  the  work 
on  infinite-dimensional  systems  theory  has  been  reported  for  specific  problem  types, 
such  as  the  solution  to  a  parabolic  PDE,  see  for  example  [161,  18,  19],  while  in 
this  research,  we  make  no  assumptions  on  the  particular  type  of  infinite-dimensional 
system  during  our  derivation  of  the  ISKF.  However,  discussions  of  topics  such  as 
observability  are  largely  problem-specific  and  usually  apply  to  subclasses  of  prob¬ 
lems  such  as  various  types  of  PDEs  [38,  163,  184,  185,  18,  19].  We  will  not  report 
any  results  with  regard  to  the  very  important  topic  of  observability  or  its  dual,  con¬ 
trollability.  Other  researchers  have  extended  the  study  of  infinite- dimensional  linear 

8Other  than  the  seminal  paper  by  Kalman,  most  of  the  references  cited  in  this  section  are  simply 
good  references  that  have  been  used  by  the  author  during  this  research  and  may  or  may  not  be 
(although  they  sometimes  are)  the  first  or  definitive  work  on  the  subject. 
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systems  theory  to  functional  differential  equations,  as  reported  in  [103].  Addition¬ 
ally,  the  semigroup  theory  has  been  steadily  developed  to  characterize  the  solutions 
to  a  multitude  of  initial  value  or  abstract  Cauchy  problems  (ACP)  that  feature  the 
evolution  equation  [83,  216,  160,  48].  During  the  past  thirty  years,  mathematical 
control  theorists  have  studied  and  reported  on  the  evolution  equation  formulation 
[38,  160,  39].  Probability  theory,  stochastic  processes,  and  stochastic  calculus,  as 
developed  in  [42,  40,  66,  45],  are  needed  to  characterize  fully  the  additive  noise  pro¬ 
cesses  perturbing  our  evolution  equations  representing  the  dynamics  model  as  well 
as  the  measurement  model  equations.  Finally,  our  development  of  ISKF  is  framed 
and  executed  using  the  tools  of  functional  analysis  [83,  154,  36,  37,  24], 

While  Chapter  II  gives  a  thorough  background  for  the  MMAE,  it  provides 
only  brief  descriptions  of  some  of  the  advanced  material  mentioned  above.  All  of  the 
advanced  topics  given  above  are  developed  to  the  level  necessary  for  the  derivation  of 
the  ISKF  and  the  algorithm  for  creating  the  equivalent  infinite-dimensional  discrete¬ 
time  model  in  Chapter  III.  Furthermore,  we  employ  a  Galerkin-like  method9  to 
create  an  essentially-equivalent  discrete-time  model  for  our  stochastic  PDE  problem 
in  Chapter  IV.  But  first,  we  conclude  this  chapter  with  some  notes  to  the  reader,  a 
summary,  and  a  fuller  outline  of  the  rest  of  the  dissertation. 

1.5  Notes  to  the  Reader 

1.  References:  When  multiple  references  are  cited  in  the  text,  the  ordering  is 
generally  in  chronological  order  from  the  first  source  to  the  most  recent.  The 
primary  exceptions  to  this  rule  occurs  when  a  source  is  difficult  to  obtain  and 
a  more  recent  source  adequately  explains  the  topic  (or  sometimes  more  clearly 
than  the  original  source)  and  gives  proper  credit  to  the  original  source;  such  a 
source  will  be  listed  first.  This  will  usually  occur  without  further  clarification. 

9See  [63,  89,  62,  30,  61]  for  an  accounting  of  the  Galerkin  method  that  has  been  used  to  find 
approximate  solutions  to  PDEs. 
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On  the  other  hand,  the  bibliography  cites  the  references  in  strictly  alphabet¬ 
ical  order  for  each  author  and  chronologically  for  each  author  with  multiple 
contributions. 

2.  Notations:  Due  to  the  confluence  of  several  engineering  and  mathematical 
notational  conventions,  we  have  at  times  created  a  new  or  slightly  altered 
symbol  for  a  common  quantity  or  concept  so  that  a  single  symbol  rarely  has 
multiple  meanings  that  must  be  ascertained  from  the  context.  Oftentimes  the 
font  type  is  used  to  differentiate  two  similar  symbols.  For  the  most  part,  the 
notational  convention  used  by  Maybeck  [129,  130]  is  used;  e.g.,  x  is  a  random 
state  variable  as  indicted  by  the  sans  serif  font  used,  while  a  realization  of 
x  is  denoted  by  x(ca)  =  x.  A  second  example  calls  on  the  Hilbert  space,  HI, 
the  measurement  distribution  operator,  H,  and  the  measurement  distributor 
matrix,  H;  all  are  typeset  using  upper  case  lettering,  but  with  different  font 
types.  Another  good  example  is  the  set  of  real  numbers,  M,  the  measurement 
residual  vector,  r,  the  measurement  noise  covariance  matrix,  R,  the  range  of  a 
mapping,  7 Z,  and  the  random  measurement  noise  covariance  R  we  just  saw  in 
the  navigation  example.  In  addition  to  the  general  rules  that  follow,  see  the 
complete  list  of  symbols  beginning  on  page  xvii. 

>  Scalars  are  denoted  by  both  upper  and  lower  case  letters  in  italic  type  for 
Arabic  letters  and  lower  case  only  for  Greek  letters.  For  example,  j,  N,  and  k 
are  all  scalars. 

>  Vectors  are  denoted  by  lower  case  letters  in  boldface  type,  such  as  x  or  at. 

>  Matrices  are  denoted  by  upper  case  letters  in  boldface  type  like  R  and  <f>. 
Additionally,  an  n  x  m  matrix,  for  m  —  1  or  n  —  1,  while  technically  a  vector 
or  the  transpose  of  a  vector,  will  sometimes  retain  the  boldface  upper  case 
typesetting  when  there  is  more  to  gain  by  using  the  familiar  typeset. 

>  Functions  are  generally  denoted  in  the  same  fashion  as  are  scalars;  however, 
when  a  function  is  treated  as  an  element  in  an  infinite-dimensional  vector  space, 
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it  is  often  written  as  a  vector  in  boldface  type.  For  example,  /  and  x  are  both 
functions;  /  is  a  scalar  function,  while  x  is  generally  an  n-vector  of  functions; 
however,  n  could  be  one. 

>  Transformations  and  operators  are  denoted  by  upper  case  letters  in  italic, 
as  in  F  for  transformations  (and  operators)  that  are  associated  with  Kalman 
filtering,  and  calligraphy  type  script  for  other  ‘standard’  operators  such  as  the 
conditional  expectation  operator  £.  The  only  reason  for  this  difference  is  sim¬ 
ply  to  maintain  the  aesthetics  of  the  notation  developed  for  finite-dimensional 
filtering. 

>  Sets  are  denoted  by  upper  case  double-lined  blackboard  type:  A  and  X. 
Special  sets  such  as  the  a  field,  A,  are  usually  denoted  using  the  calligraphy 
font. 

>  Set  operators  are  often  denoted  by  a  second  calligraphy  type.  Two  examples 
are  23  and 

>  Random  vectors  and  vector  stochastic  processes  are  set  in  boldface  sans  serif 
type,  e.g.,  x. 

>  Random  variables  and  scalar  stochastic  processes  are  set  in  sans  serif  type, 
as  in  x. 

>  Realizations  of  the  random  vector  are  set  in  boldface  roman  type,  x(uy)  =  x, 
while  its  scalar  components  (realizations  of  random  variables)  are  denoted  in 
italics  as  Xk,  for  k  —  1,2,.... 

>  Similarly,  samples  of  the  stochastic  vector  process  are  set  in  boldface  roman 
type,  x(t,LVi)  =  x(f),  while  its  scalar  components  (samples  of  stochastic  scalar 
processes)  are  denoted  by  Xk(t),  for  k  —  1,  2, . . .. 

>  Finally,  a  few  special  operators  are  given  in  standard  form:  the  integral, 
fh  x(t)  dt;  the  sum,  YliLixii  the  product,  n^i  x£  the  intersection,  P|  f=1Ai. 
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1 . 6  Summary 

In  this  chapter  we  introduced  the  Kalman  filter  and  the  multiple  model  adap¬ 
tive  estimation  (MMAE)  technique  which  is  used  to  extend  Kalman  filtering  in  two 
manners.  Using  a  bank  of  elemental  filters,  the  MMAE  method  can  improve  state 
estimation  by  readily  adapting  to  an  unknown  noise  environment  or  by  identifying 
uncertain  system  parameters.  Additionally,  it  can  be  tuned  to  focus  on  perform¬ 
ing  system  identification  (also  known  as  parameter  estimation)  in  either  a  known 
or  unknown  noise  environment.  Previous  researchers  have  concentrated  on  finite¬ 
dimensional  problems;  this  research  expands  the  class  of  problems  to  encompass 
problems  more  accurately  modeled  using  infinite-dimensional  state  space  descrip¬ 
tions.  Much  of  the  previous  work  on  distributed-parameter  systems  and  systems 
featuring  time-delayed  differential  equation  models  have  relied  on  ad  hoc  methods  to 
solve  their  problems.  This  research  puts  these  sorts  of  problems  on  firm  theoretical 
ground.  While  one  must  make  approximations  at  some  point  to  produce  an  algo¬ 
rithm  to  run  on  a  digital  computer,  these  approximations  occur  later  in  the  design 
process  presented  herein,  and  they  can  be  optimized  for  the  computational  load  or 
whatever  criteria  is  most  important  for  the  application. 

1.7  The  Rest  of  the  Dissertation 

In  Chapter  II  we  give  a  lengthly  accounting  of  MMAE  techniques.  While  only 
a  small  portion  of  this  extensive  review  is  necessary  for  understanding  the  fixed-bank 
MMAE  method  employed  in  this  research,  it  was  included  as  a  useful  survey  for  the 
reader  interested  in  extending  this  line  of  research  where  more  robust  moving-bank 
variants  of  MMAE  are  essential  to  improved  estimation  performance  or  in  response 
to  tighter  restrictions  on  the  computational  loading. 

In  Chapter  III  we  derive  the  infinite-dimensional  samplcd-data  Kalman  fil¬ 
ter  (ISKF).  We  also  develop  a  method  (analogous  to  the  technique  used  for  finite- 
dimensional  systems  as  described  in  the  previous  chapter)  for  creating  the  equivalent 
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infinite- dimensional  discrete-time  model  from  the  infinite- dimensional  continuous¬ 
time  model.  Finally,  we  extend  the  structure  of  fixed-bank  MMAE  framework  so  that 
it  may  accept  the  ISKF,  thus  creating  the  generalized  infinite-dimensional  MMAE 
(GIMMAE). 

In  Chapter  IV  we  demonstrate  our  new  and  modified  techniques  in  an  extended 
example  using  the  stochastic  heat  equation.  To  demonstrate  the  power  and  advan¬ 
tages  of  the  methodology  developed  in  the  previous  chapter,  we  devote  a  considerable 
portion  of  this  work  to  simulating,  in  Chapter  V,  the  state  estimation  performance  of 
an  MMAE  populated  with  Kalman  filters  based  on  the  essentially-equi  valent  finite¬ 
dimensional  discrete-time  model  we  created.  The  MMAE  demonstrates  good  state 
estimation  performance  while  operating  in  an  uncertain  noise  environment;  addition¬ 
ally,  the  MMAE  is  shown  to  be  capable  of  quickly  performing  system  identification. 

In  Chapter  VI  we  begin  by  reviewing  the  contributions  of  this  research.  Next, 
we  draw  some  conclusions,  and  finally,  we  offer  recommendations  for  future  work. 
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II.  Multiple  Model  Adaptive  Estimation 


2. 1  Introduction 

As  we  saw  in  Chapter  I  there  are  many  problems  which  can  be  adequately 
modeled  by  stochastic  differential  equations,  upon  which  a  Kalman  filter  or  set  of 
Kalman  filters  could  be  based.  When  there  are  uncertain  model  parameters,  a  group 
of  Kalman  filters,  such  as  a  parallel  bank  of  filters,  acting  in  concert,  generally 
provides  a  better  state  estimate  than  a  single  filter.  In  addition  to  providing  a 
superior  state  estimate,  the  filter  bank  structure  provides  good  estimates  of  the 
uncertain  model  parameters.  To  that  end,  we  endeavor  to  describe  and  discuss 
the  estimation  framework  known  as  multiple  model  adaptive  estimation  (MMAE). 
Additionally,  many  enhancements  to  the  basic  structure  are  presented  in  this  chapter. 
Multiple  model  methods  have  been  employed  in  the  following  areas:  target  tracking 
[140,  14,  137],  aided  inertial  navigation  systems  [50,  144],  sensor  and  actuator  failure 
detection  and  identification  [56] ,  aircraft  and  space  structures  (guidance  and  control) 
[65,  171],  drug  infusion1  [127,  215],  and  chemical  process  control  [173]. 

Only  the  first  four  sections  of  this  background  chapter  are  necessary  to  under¬ 
stand  the  formative  stages  and  the  basic  fundamental  concepts  of  MMAE  as  applied 
in  this  research.  Following  this  introductory  section,  we  give  a  synopsis  of  the  early 
contributions  that  laid  the  groundwork  for  today’s  MMAE  research.  Next,  the  struc¬ 
ture  and  components  of  MMAE  are  presented.  The  final  section  of  essential  reading 
contains  an  extensive  collection  of  practical  performance  enhancements  to  improve 
the  performance  of  the  MMAE.  Section  2.5  provides  an  excellent  example  on  how 
a  single  filter  relates  to  a  bank  of  filters.  Lastly,  Section  2.6  introduces  a  class  of 

1  Drug  infusion  is  a  diffusion  process  with  time-varying  parameters;  hence  it  is  one  of  the  areas 
that  may  benefit  from  the  infinite-dimensional  approach  developed  in  this  research. 
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dynamic  filter  bank  techniques  used  both  to  increase  the  breadth  of  the  filter  bank 
while  retaining  the  resolution  of  a  narrowly  focused  static  bank  of  filters. 

2.2  The  Beginnings  of  Multiple  Model  Adaptive  Estimation  and  Control 

In  this  section,  we  highlight  the  first  twenty  years  of  MMAE-related  research 
appearing  in  the  literature.  This  research  laid  the  groundwork  for  the  multiple 
model  methodology  that  has  blossomed  over  the  past  forty  years  since  Magill  [125] 
proposed  his  method  for  employing  a  set  of  Kalman  filters  [95]  in  an  uncertain  noise 
environment.  While  we  have  assumed  a  certain  familiarity  with  Kalman  filtering, 
we  have  described  and  discussed  in-depth  the  dynamics  and  measurement  models2, 
as  well  as  the  filtering  algorithm  itself.  A  significant  cross-section  of  the  available 
literature  is  cited  as  we  endeavor  to  prepare  a  solid  foundation  for  a  deep  coverage  of 
multiple  model  methodology  and  associated  techniques  that  are  the  main  subjects 
of  this  chapter.  This  first  subsection  is  organized  chronologically  to  emphasize  the 
growth  and  increasing  sophistication  of  the  research  area  over  time. 

In  1965,  Magill  [125]  presented  a  novel  state  estimation  technique  for  a  sampled 
Gauss-Markov  stochastic  process.  He  employed  multiple  models  to  address  the  prob¬ 
lem  of  unknown  parameter  variation  from  within  a  finite  set  of  known  values.  For 
this  inaugural  work,  the  parameters  described  the  statistics  of  the  dynamics  driving 
noise,  which  for  a  time-invariant  system  model  are  stationary.  Each  of  the  param¬ 
eter  values  corresponded  to  a  hypothesis  of  the  real  world  with  a  stationary  noise 
process  and  was  used  to  construct  an  elemental  filter.  These  hypotheses  were  tested 
by  computing  the  conditional  probability  of  each  hypothesis  being  correct.  These 
probabilities  are  conditioned  on  the  observed  measurements  for  each  filter.  Magill 
established  the  notion  that  a  random  “switch”  selects  the  elemental  stochastic  pro¬ 
cess  that  is  in  force.  Using  the  hypothesis  conditional  probabilities  as  weights,  he 

2In  the  estimation  research  discussed  in  this  dissertation,  the  operation  of  the  system  or  plant 
is  governed  by  a  dynamics  model,  while  the  system  behavior  is  imperfectly  observed  using  a  mea¬ 
surement  model. 
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formed  a  weighted  sum  of  the  state  estimates  from  each  filter;  therefore  the  estimate 
is  a  blend  of  all  of  the  models  hypothesized  to  represent  the  real  world  system.  This 
estimate  is  optimal  provided  that  one  of  the  assumed  models  matches  the  physical 
process  when  the  unknown  parameter  is  a  constant  vector.  Furthermore,  he  asserted 
that  a  sufficient  condition  for  this  optimality  is:  if  all  of  the  elemental  stochastic 
processes  are  ergodic,  then  the  weighting  coefficients  will  converge  with  probability 
one  to  unity  for  the  true  process  and  to  zero  for  the  others. 

Early  in  1969,  Hillborn  and  Lainiotis  [82]  extended  Magill’s  work  from  the  case 
of  scalar  measurements  to  the  vector  measurement  case  and  presented  an  optimal 
conditional  mean  estimator  based  on  unknown  (but  constant)  parameters.  They 
state  that,  under  certain  necessary  and  sufficient  conditions,  in  a  Bayesian  sense,  the 
optimality  of  their  state  estimate  is  independent  of  the  convergence  to  the  value  of 
the  unknown  parameter  Hence  at  each  step  their  state  estimate  is  optimal,  whereas 
Magill’s  estimator  is  only  optimal  if  the  true  value  of  the  parameter  precisely  matches 
one  of  the  elemental  filters  in  the  bank. 

Later  in  1969,  Sengbush  and  Lainiotis  [174]  proposed  a  binary  method  to  quan¬ 
tize  the  parameter  space  efficiently;  the  discretization  process  must  be  fine  enough 
so  that  the  true  parameter  value  can  be  accurately  estimated,  but  since  computer 
resources  are  finite,  the  quantization  must  also  be  sufficiently  coarse.  Their  tech¬ 
nique  is  itself  iterative  in  nature  and  only  requires  two  quantization  levels  for  each 
parameter  (of  the  uncertain  parameter  vector)  being  estimated. 

In  1970,  Ackerson  and  Fu  [2]  generalized  Magill’s  work  when  they  proposed  a 
method  to  extend  the  discrete  Kalman-Bucy  filter  by  allowing  a  nonstationary  noise 
process  consisting  of  a  group  of  Gaussian  distributions  to  drive  the  filter.  They  al¬ 
lowed  the  input  and  noise  process  statistics  to  change  (or  switch)  in  discrete  “jumps” 
according  to  a  Markov  transition  process  matrix.  Hence  the  system  can  be  charac¬ 
terized  by  a  model  with  Markov  switching  parameters.  Whereas  Magill’s  formulation 
employed  multiple  models  to  identify  the  statistics  of  a  static  or  unchanging  system, 
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this  work  used  multiple  models  to  characterize  a  dynamic,  time-varying  system. 
Since  the  optimal  algorithm  suffers  from  needing  an  ever-growing  amount  of  mem¬ 
ory3,  they  proposed  a  suboptimal  finite- memory  estimator  that  assumes  that  the 
hypothesis  conditional  probability  is  normally  distributed  when  in  fact  it  is  a  sum 
of  Gaussians  that  grows  exponentially  with  increasing  time. 

In  1971,  Lainiotis  [107]  showed  how  an  estimator,  from  a  class  of  nonlin¬ 
ear  adaptive  estimators,  may  be  decomposed  into  a  nonadaptive  part  (the  bank 
of  Kalman  or  Kalman-Bucy  filters)  and  a  nonlinear  adaptive  part  which  is  tasked 
with  identifying  the  “mode”  of  the  system.  A  weighted  sum  of  hypothesis  conditional 
probabilities  is  used  to  identify  the  true  system  mode.  He  applied  this  decomposition 
to  the  problem  of  state  estimation  with  non-Gaussian  initial  state. 

In  1973,  Moose  and  Wang  [151]  proposed  modeling  the  modes  or  states  of 
the  system  with  a  semi-Markov  process.  In  a  semi-Markov  process,  the  transitions 
between  states  are  dictated  by  the  familiar  Markov  transition  matrix;  however,  the 
amount  of  time  spent  in  the  current  state  before  switching  to  the  next  state  is  a 
random  variable,  i.e.,  not  constant  as  with  a  Markov  process.  They  claimed  that  this 
modification  completely  solved  the  problem  of  needing  increasing  computer  storage 
capacity  with  increasing  time.  Two  years  later,  Moose  [150]  applied  this  formulation 
to  the  maneuvering  target  problem. 

In  1974,  Fry  and  Sage  [59]  employed  the  hierarchical  estimation  theory  devel¬ 
oped  by  Smith  and  Sage  [179]  to  reduce  the  computational  requirements  of  Magill’s 
method4.  The  hierarchical  approach  is  used  to  decompose  a  complex  system  into 


3To  characterize  the  parameter  history  fully  requires  Kl  hypotheses  —  and  thus  Kl  elemental 
filters  are  required.  For  example,  at  time  t\ ,  we  have,  for  all  intents  and  purposes,  the  constant 
parameter  case  and  thus  we  need  just  K  elemental  filters;  one  for  each  of  the  assumed  values  that 
the  parameter  may  assume.  Then,  at  time  £2,  we  now  require  K2  elemental  filters  since  there  are 
now  I\ 2  possible  parameter  value  trajectories,  and  so  on.  Consequently,  the  number  of  elemental 
filters,  and  hence  the  memory  required,  grows  exponentially  when  we  desire  an  elemental  filter  that 
can  exactly  match  the  parameter’s  time  history. 

4This  paper  by  Fry  and  Sage  contains  an  excellent  review  of  Magill’s  paper;  see  also,  Maybeck’s 
second  volume  for  a  review  of  the  multiple  model  adaptive  filter  [130]. 
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several  simple  subsystems.  The  melding  of  these  two  techniques  enables  the  appli¬ 
cation  of  multiple  model  methods  for  hierarchically  structured  problems  that  would 
otherwise  require  an  insurmountable  computation  load. 

In  1976,  Lainiotis  unified  many  of  the  ideas  regarding  partitioned  or  MMAE 
techniques  in  [108]  and  presented  the  idea  of  cascading  controllers  with  the  elemental 
filters  [109].  Thus,  multiple  model  adaptive  control  (MMAC)  was  born. 

In  1976,  Hawkes  and  Moore  [75,  76]  reported  two  important  results.  They 
calculated  an  upper  bound  for  the  mean-squared  error  obtained  for  a  finite  parameter 
set  assumption.  Secondly,  they  established  some  necessary  and  sufficient  conditions 
for  exponential  convergence  of  the  Bayesian  estimate  to  the  true  values  in  the  mean- 
squared  error  sense  for  systems  with  measurements  corrupted  by  stationary  zero- 
mean  Gaussian  random  processes. 

In  1977,  a  group  led  by  Athans  [10]  devised  the  first  practical  implementation  of 
MMAE  and  MMAC  in  a  problem  that  failed  to  showcase  the  potential  of  the  multiple 
model  methods  because  the  F-8C  aircraft  flight  controller  did  not  need  an  adaptive 
estimator  or  controller.  Nonetheless,  hundreds  of  researchers  have  contributed  many 
articles  and  books  devoted  to  estimation  (and  control)  using  multiple  model  methods 
in  the  past  thirty  years;  this  large  volume  alone  is  an  indication  of  its  utility  and 
applicability. 

In  1978,  Chang  and  Athans  [34]  proved  that  if  one  of  the  models  in  the  set  of 
K  constant-parameter  elemental  filters  exactly  matched  the  real  world  system,  then 
an  MMAE  based  estimator  was  optimal.  In  the  event  that  we  don’t  discretize  the 
parameter  space  such  that  one  of  the  K  elemental  filter  models  is  the  truth,  then 
we  may  say  that  the  MMAE  will  converge  to  the  closest  hypothesized  model  in  the 
Baram  sense  [16,  15,  175,  177].  However,  no  research  has  yet  given  us  a  guaranteed 
convergence  rate  [79].  Additionally,  Chang  and  Athans  proposed  an  optimal  estima¬ 
tor  using  K 2  elemental  filters  for  the  case  in  which  the  unknown  parameter  vector 
is  allowed  to  vary  or  switch,  specifically,  when  it  follows  a  Markov  process,  i.e.,  the 
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present  parameter  vector  value  depends  only  on  the  previous  parameter  vector  value. 
This  estimator  is  known  as  a  switching  parameter  algorithm. 

In  1979.  Tugnait  [195]  pointed  out  that  Chang  and  Athans’  development  [34] 
was  actually  suboptimal  for  the  Markov  parameter  case.  Tugnait  stated  that  the 
Chang  and  Athans  paper  approximated  the  probability  density  function  (PDF)  (at 
time  tj)  for  each  of  the  K2  elemental  filter  residual  processes  using  a  single  Gaussian 
PDF.  The  actual  PDF  needed  to  determine  the  true  state  optimally  for  each  ele¬ 
mental  filter  was  a  Gaussian  mixture  —  a  weighted5  sum  of  Gaussian  PDFs,  Kl~2 
in  this  case. 

In  1980,  Tugnait  [196]  investigated  the  behavior  of  a  Bayes  optimal  estimator 
with  unknown  continuous  parameter  vector.  Specifically,  he  studied  the  convergence 
properties  of  a  conditional  mean  estimator  in  which  the  unknown  parameter  is  to 
be  determined  from  an  infinite  countable  set.  He  also  applied  his  results  to  a  linear 
time-invariant  Gauss-Markov  system  model. 

In  1983,  Dasgupta  and  Westphal  [41]  extended  Hawkes  and  Moore’s  [76]  con¬ 
vergence  results  to  include  systems  with  unknown  biases.  They  note  that  in  simu¬ 
lations,  oftentimes  the  multiple  model  estimator  preferred  the  zero-mean  elemental 
filter,  hence  implementation  of  non-zero-mean  filters  should  be  considered  carefully. 

In  1984,  Blom  [22,  23]  proposed  a  computationally  efficient  algorithm  for  filter¬ 
ing  a  system  characterized  by  Markov  switching  parameters  —  a  problem  previously 
investigated  by  Ackerson  and  Fu  [2],  The  interacting  multiple  model  (IMM)  tech¬ 
nique  reduces  the  required  number  of  elemental  filters  in  the  filter  bank  through  a 
novel  hypothesis  merging  routine. 

Additionally,  many  researchers  have  explored  MMAE  methods  to  improve  the 
operating  characteristics  of  a  system  through  feedback  control.  MMAE-based  control 
occurs  when  the  control  action  is  based  on  a  state  estimate  provided  by  the  MMAE 

5The  weights  sum  to  one  so  that  the  mixture  retains  all  the  properties  of  a  PDF. 
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estimator;  see  for  example  Stepaniak  and  Maybeck  [186].  On  the  other  hand,  MMAC 
employs  a  parallel  bank  of  controllers,  each  matched  to  a  particular  filter  within 
the  bank  of  state  estimators.  As  previously  noted,  Lainiotis  [109]  was  the  first 
to  propose  this  pairing  of  filters  and  controllers;  a  group  led  by  Athans  was  the 
first  to  implement  it  [10].  The  research  developed  in  this  dissertation  is  primarily 
concerned  with  MMAE;  however,  on  occasion,  we  shall  discuss  MMAE-based  control 
and  MMAC  to  shed  additional  light  on  the  estimation  framework  itself. 

2.3  Multiple  Model  Adaptive  Estimation  Fundamentals 

MMAE  employs  a  parallel  bank  of  elemental  filters  to  process  noise-corrupted 
measurements  and  recursively  identify  uncertain  parameters,  estimate  states,  and 
compute  the  residuals  between  model-based  measurement  predictions  and  actual 
observed  measurements.  As  such,  an  MMAE  algorithm  can  adapt  itself  to  an  un¬ 
certain  noise  environment,  perform  parameter  (or  system  mode)  identification,  and 
compute  an  accurate  state  estimate.  To  accomplish  these  tasks,  the  MMAE  algo¬ 
rithm  processes  known  inputs  and  noise-corrupted  measurements  at  discrete  times 
with  a  set  of  parallel  elemental  filters  which  are  developed  using  a  mathematical 
system  model  based  on  a  pair  of  stochastic  equations  representing  the  internal  state 
dynamics  and  measurement  processes.  Each  of  the  filters  in  the  bank  represents 
a  possible  mode  of  the  system;  each  filter  is  designed  using  a  different  hypothesis 
about  the  assumed  value  for  the  parameters  used  to  describe  the  structure  of  the 
dynamics  or  measurement  models  and/or  characterize  the  statistical  properties  of 
the  dynamics  and  measurement  noise. 

The  MMAE  framework  is  shown  graphically  in  Figure  2.1.  The  noise  sources 
are  not  explicitly  labeled  and  the  time  dependence  has  been  suppressed  in  this 
diagram.  The  system  processes  known  inputs  u  and  corrupted  measurements  z 
with  a  parallel  set  of  K  elemental  filters  Fi,F2,  . . .  ,Fk  based  on  parameter  vectors 
a1,a2,...,ajf,  respectively.  Each  elemental  filter  produces  a  state  estimate  xfc,  a 
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Figure  2.1  Multiple  Model  Adaptive  Estimation 


measurement  residual  r*,  and  a  filter-computed  residual  covariance  matrix  A&.  The 
measurement  residuals  and  covariance  matrices  are  used  in  block  D  to  compute  the 
probability  pk  of  the  assumed  system  mode  parameter  vector  matching  the  true 
parameter.  In  E,  the  probabilities  are  used  in  conjunction  with  the  individual  state 
estimates  xy,  xy, . . . ,  xr-  and  the  known  parameters  ai,  a2, . . . ,  to  estimate  the 
system  state  x  and  identify  the  system  mode  parameter  vector  a.  The  following 
sections  fill  in  the  details  on  the  mathematical  system  model,  the  Kalman  filter¬ 
ing  algorithm,  the  filter  bank,  the  parameter  and  state  estimates,  and  several  key 
assumptions  driving  the  MMAE  methodology. 

2.3.1  Mathematical  System  Model.  The  performance  of  any  model-based 
algorithm  depends  heavily  on  creating  an  accurate  model  of  the  system.  The  real 
world  rarely,  if  ever,  presents  us  with  a  truly  linear  system,  but  over  limited  operating 
regimes,  many  of  the  systems  of  interest  to  us  can  be  adequately  modeled  as  linear. 
Additionally,  the  disturbances  to  the  system  are  often  well  modeled  by  a  vector  of 
additive  white  Gaussian  noise  processes.  It  is  often  useful  to  denote  both  scalar  and 
vector  Gaussian  stochastic  processes  using  the  probability  distribution  notation  [14]: 
9I[x(t);  /i(£),  £(£)],  where  Tt[-;  •,  •]  is  the  Gaussian  (normal)  probability  distribution, 
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x(t)  is  the  vector  stochastic  process  in  question  at  time  t,  nit)  the  mean  vector,  and 
S(t)  the  covariance  matrix. 

For  this  research,  we  shall  assume  that  our  linear  system  model6  has 
continuous-time  dynamics 

x(t)  =  F(t)  x(t)  +  B(f)  u {t)  +  G (t)  w(f)  (2.1) 

where  x(f)  =  dx(t)/dt,  with  measurements  available  at  discrete  times 

z (ti)  =  H (ti)  x(ti)  +  v(ti)  (2.2) 

and  is  driven  by  known  inputs,  u(f),  and  independent7,  zero-mean  white8  Gaussian 
noise  processes,  w(-,  •)  and  v(-,-),  with  known  strength  Q (f)  and  covariance  R (ti), 
respectively,  and 

x(t)  =  n  x  1  state  vector  at  time  t 

F(t)  =  n  x  n  system  dynamics  matrix  at  time  t 

Bit)  =  n  x  r  input  distributor  matrix  at  time  t 

u(t)  =  r  x  1  control  vector  at  time  t 

G  (t)  =  n  x  s  noise  distributor  matrix  at  time  t 

w(t)  =  s  x  1  Gaussian  noise  process  vector  at  time  t 

z (ti)  =  m  x  1  measurement  vector  at  time  tt 
H  {ti)  =  m  x  n  measurement  distributor  matrix  at  time  ti 
y(ti)  =  m  x  1  Gaussian  measurement  noise  process  vector  at  time  t,t 

Additionally,  these  independent  noise  processes  have  covariance  kernels  of 
E  (w(f)  wT(f')}  =  Q(t)8(t  —  t')  and  E  (v(^)  vT(fj)}  =  R (!*)%.  Recall  that  the 

6This  section  is  entirely  based  on  Maybeck  [129].  Several  other  books  such  McGarty  [141]  also 
provide  this  background  in  a  similar  notation. 

7This  assumption  is  not  required  but  simply  makes  the  presentation  easier;  for  a  discussion  on 
how  to  model  correlated  noise  processes,  see  Chapter  5  of  Maybeck  [129]  or  [181]. 

8 A  noise  process  (or  a  noise  sequence)  that  is  independent  in  time  is  known  as  a  white  random 
process  (sequence).  While  continuous-time  white  processes  don’t  exist  in  the  real  world,  this  as¬ 
sumption  is  well  justified  when  the  true  band- limited  noise  process  frequency  bandwidth  is  much 
larger  than  the  system  bandwidth  [129] .  Additionally,  time-correlated  “colored”  noise  is  treated  by 
Maybeck  [129]  (in  Chapter  4),  see  also  [181]. 
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Kronecker  delta  is  defined  by  [7,  154] 


i  =  j 
}<>,  i+i 


(2.3) 


and  the  Dirac  delta  function,  S(t),  is  defined  as  the  function  that  satisfies  the  follow¬ 
ing  [129,  28] 


5(r)  dr  =  1  and  <5(r)  =  0  for  all  r/  0 


(2.4) 


A  more  rigorous  approach  to  modeling  continuous-time  system  dynamics9 
would  employ  a  true  differential  equation  driven  by  a  Brownian  motion  (or  Wiener) 
process.  b(f),  with  diffusion  Q (£),  versus  the  more  familiar  derivative-based  Equation 
(2.1)  driven  by  a  zero-mean  white  Gaussian  noise: 


dx(f)  =  [F (t)  x(f)  +  B(t)  u (t)]dt  +  G (£)  db (£)  (2.5) 


where  b(£)  is  an  s  x  1  Brownian  motion  noise  process  vector  at  time  f,  having  a 

diffusion  of  Q(f);  the  hypothetical  derivative  of  b (t)  would  be  the  w (£)  in  Equation 

(2.1). 

Since  our  algorithm  will  be  implemented  on  a  digital  computer,  we  require 
an  equivalent  discrete-time  model  for  the  system  dynamics10.  We  shall  begin  with 
Equation  (2.5)  to  create  a  stochastic  difference  equation;  we  only  report  the  results 
here  —  see,  for  example,  Maybeck  [129],  for  the  proper  procedure,  or  our  development 
in  Section  3.4  for  a  more  general  case.  Thus  our  mathematical  model  of  the  system 
dynamics  and  measurement  process  becomes 

x(ti+ i)  =  $(£*+1,  U)  x(ti)  +  Bd(ti)  u (ti)  +  G d(ti)  w d(t»)  (2.6) 

z (U)  =  H(ti)  x(U)  +  v(ti)  (2.7) 

9The  measurement  model  of  Equation  (2.2)  is  unchanged. 

10Here  we  implicitly  assume  that  the  control  input  u(f)  is  a  piece-wise  constant  function  such 
that  u (t)  =  u (ti)  for  time  t*  <  t  <  ti+ 1. 
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where 


x(ti) 
®(U+i,ti) 
Bd  (U) 

u  (ti) 
Gd  (ti) 

wd  (t<) 

z(*») 

H(ti) 

v(*») 


n  x  1  state  process  vector  at  time  tt 

n  x  n  state  transition  matrix  from  time  ti  to  time  tj+ 1 

n  x  r  discrete-time  input  distributor  matrix  at  time  ti 

r  x  1  discrete-time  control  vector  at  time  ti 

n  x  s  discrete-time  noise  distributor  matrix  at  time  ti 

s  x  1  discrete-time  white  Gaussian  noise  process  vector  at  time  ti 

m  x  1  measurement  process  vector  at  time  ti 

m  x  n  measurement  distributor  matrix  at  time  ti 

m  x  1  white  Gaussian  measurement  noise  process  vector  at  time  ti 


and  the  discrete-time  noise  distributor  matrix  is  chosen  without  loss  of  generality  to 
be  an  n  x  n  identity  matrix:  Gd  (ti)  =  I. 

Since  we  began  with  a  continuous-time  dynamics  model,  the  state  transition 
matrix  must  satisfy  the  following  differential  equation  with  initial  condition 


d&(t,to)/dt  =  F(t)  <&(£,  £0) 
&(to,t0)  =  I 


(2.8) 


The  state  transition  matrix  has  several  important  properties: 

1.  <&(£,£')  is  uniquely  defined  for  all  times  t  and  t'  in  [0,  oo). 

2.  Semi-group  property:  &(t",t)  =  &(t",  t')  <&(£',  t)  for  any  times  t,t',t"  G  [0,  oo) 
with  t  <t'  <t". 

3.  Semi-group  property,  special  case:  <f>(t',t)$>(t,t')  =  I  for  any  times  t,t'  G 
[0,  oo)  with  t  <  t’ . 

4.  Nonsingular11:  $_1(t,  to)  =  &(to,t)  for  any  times  t,to  G  [0,  oo)  with  t  >  to- 


When  F  is  time-invariant,  i.e.,  a  constant  matrix,  then  the  state  transition  matrix  be¬ 
comes  a  function  of  the  difference  of  the  time  arguments  and  is  explicitly  represented 


11  The  state  transition  matrix  is  guaranteed  to  be  nonsingular  when  we  begin  with  a  continuous¬ 
time  system  description;  this  is  not  necessarily  so  for  naturally  discrete-time  systems. 
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as  the  matrix  exponential 


&{t,  t0)  =  &{t  -  t0)  =  exp{F(f  -  t0)} 


(2.9) 


The  equivalent  discrete-time  input  distributor  matrix,  Bd(ij),  is  found  by  integrating 
the  continuous-time  input  distributor  matrix  weighted  by  the  state  transition  matrix 
over  one  sample  period  so  that 


rU+ 1 


Bd(t»)  =  /  $(ti+i,r)B(r)dr 


(2.10) 


The  covariance  kernel  for  the  zero-mean  white12  Gaussian  dynamics  noise  sequence 
is  given  by 

£{wd(ti)wJ(tj)}  =  I Q<1^  ^  *  J  (2.11) 

[0,  ti^tj 

where  the  continuous-time  noise  strength  Q  (t)  is  used  to  determine  the  positive 
semi-definite  discrete-time  noise  covariance  matrix,  Q d(U),  expressed  as 

Qd(U)  =  J  $(tj+i,r)G(r)Q(r)GT(r)$r(fi+i,r)dr  (2.12) 


The  covariance  kernel  for  the  zero-mean  white  Gaussian  measurement  noise  sequence 
is  given  by 

{R  (U),  ti  =  t, 

(2.13) 

0,  ti  ^  tj 

where  R (ti)  is  assumed  to  be  positive  definite.  The  discrete-time  noise  processes 
wd(-,  •)  and  v(-,  •)  are  assumed  to  be  independent,  random  processes.  Additionally, 
the  initial  state  condition  x(to)  is  not  known  precisely;  it  will  be  modeled  as  a 
Gaussian  random  vector,  independent  of  both  noise  processes  wd(-,-)  and  v(-,-), 

12A  white  process  is  independent  in  time  and  thus  has  a  zero  (matrix)  covariance  kernel  for  the 
ti  ^  tj  case;  see  the  second  line  of  Equations  (2.11)  and  (2.13). 


2-12 


with  a  mean  and  covariance  of 


(2.14) 

(2.15) 


£{x(«0)}  =  x0 
E  { (x(i0)  -  x0]  M*o)  -  x0]T  }  =  P„ 

respectively,  where  Pq  is  positive  semi-definite  matrix. 

For  naturally  discrete-time  systems,  the  dynamics  model  could  be  described 
more  generally  with 

Gd {ti)  =  n  x  s  noise  distributor  matrix  at  time  ti 
w/d(ti)  =  s  x  1  noise  process  vector  at  time  ti 

where  the  noise  process  Wd(-,  •)  is  zero-mean  white  Gaussian  noise  process  vector  at 
time  ti  with  positive  semi-definite  s  x  s  covariance  matrix  Qd (ti).  However ,  all  of 
the  equations  in  this  section  and  those  that  follow  are  invariant  with  respect  to  the 
dimensions  of  Gd  and  Wj. 

The  time-varying  model  has  been  used  extensively  for  state  estimation  prob¬ 
lems,  but  for  identifying  parameter  variation  over  a  set  of  constant  (or  slowly  time- 
varying)  parameters,  we  will  generally  employ  the  following  time-invariant  system 
model  equations: 

x(ti)  =  $x(ij_i)  +  Bdu(ti_i)  +  Wd(ii-i)  (2.16) 

z  (U)  =  H  x(ti)  +  v(ti),  (2.17) 

where  the  time-invariance  property  gives  Bd(tj-i)  =  Bd  and  H(tj)  =  H.  uniform 
spacing  between  time  samples  yields  a  state  transition  matrix  independent  of  time, 
i.e.,  <&(ij,tj_ i)  =  <&,  and  stationary  noises  processes  result  in:  Qd(£j)  =  Qd  and 
R  {ti)  =  R  for  all  time  t*.  In  this  time-invariant  case,  these  zero-mean  white  Gaussian 
noise  processes  Wd(-,  •)  and  v(-,  •)  can  be  thought  of  as  sequences  of  independent, 
identically  distributed  (HD)  random  variables  and  denoted  by  Wd (U)  ~  91(0,  Qd) 
and  v(fj)  rs_/  91(0,  R),  where  the  91  indicates  that  the  random  vector  process  has  a 


2-13 


normal  or  Gaussian  distribution  [159,  170].  Furthermore,  a  derivation  of  the  steady- 
state  Kalman  filter  is  based  on  the  time-invariant  system  model  [129]  . 

For  some  problems  of  interest,  one  or  both  of  the  noise  sources  can  be  discrete 
space-time  point  processes  versus  continuous-time  (for  the  dynamics)  or  discrete-time 
(for  the  measurement  and/or  dynamics)  Gaussian  processes.  In  order  to  preserve 
the  Markov  nature  of  the  state  estimate,  the  noise  process  must  be  an  independent 
increment  process,  i.e.,  the  noise  processes  for  nonoverlapping  (disjoint)  periods  of 
time  are  independent  —  see  Definition  60  in  Chapter  III.  Thus,  the  Gaussian  prop¬ 
erty  assumed  for  the  driving  noise  during  the  development  of  the  Kalman  filter  is 
not  a  necessary  condition  to  derive  an  optimal  filter,  it  is  merely  sufficient  since  the 
white  Gaussian  process  is  an  independent  increment  process.  If  the  dynamics  noise 
Wj  and/or  the  measurement  noise  v  is  a  generalized  Poisson  point  process,  we  can 
still  derive  an  optimal  filter.  The  Snyder  filter  assumes  that  the  measurement  noise 
process  is  a  Poisson  point  process  [180].  The  Snyder  filter  was  employed  Meer  [142] 
and  others  [218,  93,  74]  in  an  MMAE  structure  to  control  the  pointing  and  tracking 
of  particle  beams.  This  will  not  be  pursued  further  herein. 

Some  useful  notation  for  describing  the  stochastic  measurement  history  and  its 
realization  are  defined  as: 


2(ti) 

Zl 

m  = 

Z(*2) 

and  Zj  = 

z2 

N 

J-+- 

Z  i 

respectively,  where  Zj  is  a  convenient  notation  for  z (fj),  a  specific  realization  of  the 
random  vector  z(f,;).  Note  that  these  vectors  “grow”  over  time. 

2.3.2  Kalman  Filtering.  An  optimal  solution  to  the  estimation  problem 
discussed  above  in  Equations  (2.6)  through  (2.15)  was  given  by  Kalman  in  his  1960 
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paper  “A  New  Approach  to  Linear  Filtering  and  Prediction  Problems”  [95] 13.  The 
following  year,  Kalman  and  Bncy  presented  their  solution  for  the  continuous-time 
problem  [96].  A  recent  derivation  of  Kalman’s  results  were  presented  by  Catlin  [33] 
to  the  mathematical  community  in  his  1989  book:  Estimation,  Control,  and  the 
Discrete  Kalman  Filter  as  a  “beautiful  illustration  of  functional  analysis  in  action” 
in  which  the  projection  theorem  in  a  Hilbert  space  plays  a  central  role14.  Recall  that 
the  intent  of  this  research  is  to  explore  the  utility  of  employing  multiple  models  in 
a  parallel  structure  in  order  to  improve  state  estimation  (or  to  identify  the  system 
mode  parameters  themselves);  hence  we  shall  only  develop  those  concepts  directly 
related  to  the  structure.  We  will  follow  Maybeck’s  treatment  of  the  Kalman  filter  — 
an  optimal15  recursive  data  processing  algorithm  [129].  The  Kalman  filter  recursively 
generates  the  optimal  state  estimate  to  the  problem  posed  above  with  a  two  stage 
process:  first,  it  predicts  the  state  for  time  t,t  using  only  the  dynamics  model  and 
the  measurements  up  through  time  f;_i  and  then  corrects  or  updates  the  estimate 
with  noise-corrupted  measurements16. 

According  to  the  Bayesian  viewpoint  espoused  by  Maybeck  [129],  the  sampled- 
data  Kalman  filter  algorithm  consists  of  (initializing  and  then)  recursively  propagat- 


13Kalman’s  original  paper  has  been  republished  in  many  collections,  such  as  the  Kalman  filtering 
collection  edited  by  Sorenson  [183]  and  another  collection  edited  by  Ba§ar  that  focuses  on  control 
theory  [12]. 

14The  infinite-dimensional  sampled-data  Kalman  filter  we  derive  in  Chapter  3  is  a  more  general 
beautiful  illustration  of  functional  analysis  in  action. 

15The  Kalman  filter  produces  the  optimal  state  estimate,  xRj1),  for  a  stochastic  linear  system 
driven  by  zero-mean,  white  Gaussian  noise  processes  with  known  covariances  [129]!  In  the  Bayesian 
sense,  x(fb )  is  the  optimal  state  estimate  because  it  is  the  mean,  median,  and  mode  of  the  Gaussian 
conditional  PDF  /x(ti)|z(ti)(^|Zj).  x(fj~)  minimizes  the  mean-squared  error  (MSE)  and  the  symmet¬ 
ric  cost  function  by  virtue  of  being  the  conditional  mean  of  the  Gaussian  conditional  PDF.  x(fb ) 
is  the  maximum  a  posteriori  (MAP)  state  estimate  and  when  there  is  no  initial  state  information, 
i.e.,  Pq1  =  0,  then  is  also  the  maximum  likelihood  (ML)  estimate.  When  the  noises  are 

nonGaussian,  the  Kalman  filter  estimate  is  the  optimal  linear  estimator:  it  is  the  linear  minimum 
variance  unbiased  (MVU)  estimate;  thus,  we’re  saying  that  a  nonlinear  estimator  may  do  better. 
And  finally,  as  Kalman  originally  posed  [95],  x(fb)  is  the  orthogonal  projection  of  the  true  state 
x(tj)  onto  the  subspace  spanned  by  the  random  measurement  history  Z(L),  i.e.,  x(f)")  satisfies  the 
projection  theorem  [122]  and  is  thus  the  optimal  estimate  of  x(i$)  given  measurements  Z(f,;). 

16For  nonlinear  models,  the  extended  Kalman  filter  (EKF)  is  an  appropriate  tool;  see  Maybeck 
[130]  or  Sworder  and  Boyd  [193]  for  information  on  the  EKF  and  other  nonlinear  filters. 
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ing  and  updating  the  state  conditional  PDF.  The  sampled-data  Kalman  filter 
algorithm  is  [129]: 

1.  Initialize  the  Gaussian  PDF ,  /x(t0)(£)-  The  initial  state  is  modeled  by  a  Gaus¬ 
sian  random  vector  with  mean  and  covariance  given  by  the  initial  state  estimate 
x(t0)  and  initial  covariance  estimate  P(t0): 

x(t0)  =  £{x(t0)}  =  (2.19) 

and 

P(*o)  =  E{[x(t0)  ~  x0][x(t0)  -  x0]T}  =  P0  (2.20) 


2.  Propagate  the  Gaussian  conditional  PDF.  The  propagation  is  entirely  based 
on  the  known  internal  dynamics  model  conditioned  on  the  observed  measure¬ 
ments.  This  stage  predicts  the  state  estimate  at  time  U17  given  the  optimal 
estimate  at  time 


x(tj  )  =  E{x(ti)\Z(ti_i)  =  Zj_i} 


=  &(ti,  U- 1)  x(t+  J  +  Bd (ti_i)  u(ti_i) 


(2.21) 


and 


P(t-  )  =  E{[x(ti)  -x(^  )][x(ti)  -x(t.  )] T  |  Z  (t  j_!  )  =  Zj_!  } 

—  ^(^i)  ti—i)  P(tili)  ^?T(U>  ti—i)  +  Gd(tj-i)  Qd(^-i)  Gd  (U-i) 

where  the  expectation  is  taken  with  respect  to  conditional  PDF 
Mnmu-iMZi-i). 

3.  Update  the  Gaussian  conditional  PDF.  Update  the  state  and  covariance  esti¬ 
mates  at  time  tt  with  the  latest  measurement  z (ti,Uj)  =  z?:  to  produce  x(£d) 

1'Time  t~  represents  the  time  tt  just  prior  to  measurement  update;  some  authors  [170]  use  the 
notation  tfti-i  and  while  others  [13]  often  use  only  the  indices,  i.e.,  i\i  —  1.  Time  tf  represents 
the  time  t,  just  after  measurement  update;  some  authors  write  this  as  tf  ti  or  i\i. 
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and  P  (tf).  The  filter-computed  residual  covariance  A (i,)  and  Kalman  gain 
K  (U)  are  computed  first  as: 


A (U)  =  H (U)  P (tr)  HT(tj)  +  R (U)  (2.23) 

and 

K(tj)  =  P(t~ )  HT(tj)  A-1(ti)  (2.24) 

Using  the  conditional  PDF,  /x^piz^p^lZj),  we  can  compute  the  optimal  state 
estimate18  given  by 


x(t+)  4  £{x(U)|Z(U)  =  Z.t} 

=  x(tr)  +  K(U)r(U) 

where  the  Kalman  filter  residual  r(U)  is 

r(U)  =  Zj  -  H(U)x(t“) 

where  H(U)  x(t“)  is  the  predicted  measurement  sometimes  denoted  z (t~),  and 

the  updated  error  covariance  is 

P(«,+)  =  E  {[x(«0  -  x(«+)][x((i)  -  x((+)]T|Z((i)  =  Z ,} 

=  P(<r)  -  K(*i)  H(*i)  P(if) 

4.  Return  to  step  2 

2.3.3  Filter  Bank.  A  natural  way  to  extend  the  concept  of  state  estimation 
using  a  single  filter  into  the  realm  of  joint  state  and  parameter  estimation  is  to  employ 

18The  state  estimate  i(tf)  is  the  sum  of  a  prediction  x(f“),  which  is  a  sufficient  statistic  for 
the  state  x(t,)  given  Z(t»_i)  [100],  and  a  correction  term  K (f,;)r(b)  which  represents  the  “new 
information”  provided  by  the  current  measurement. 


(2.25) 


(2.26) 
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a  parallel  bank  of  filters,  i.e.,  a  filter  bank.  Observe  that  the  conditional  PDF  for  the 
state  (at  measurement  update)  can  be  interpreted  as  a  marginal  conditional  PDF 
computed  from  the  joint  conditional  PDF,  i.e., 

/OO 

/x(t»),a(ti)|Z(ti)(^)alZi)  doL  (2-28) 

-OO 

where  the  system  mode  a  (tfij  is  a  random  vector  and  a  represents  a  particular  mode 
of  the  system.  While  the  structure  of  an  elemental  filter  is  designed  using  the  same 
mathematical  system  model  based  on  a  pair  of  stochastic  equations  representing 
the  internal  state  dynamics  and  measurement  processes,  each  mode  of  the  system 
is  characterized  by  a  unique  parameter  vector  and  described  probabilistically  by 
a  Gaussian  PDF:  fx(ti)\a(ti),z(ti){£ |«,  Zj),  with  mean  x(f)1")  and  covariance  P(f)1")  as 
computed  by  a  Kalman  filter  based  on  the  parameter  value  a (tfi  =  a.  From  the 
Bayesian  point  of  view,  we  are  motivated  to  pursue  the  joint  state  and  parameter 
vector  conditional  PDF: 


f x(ti),a(h)|Z(ti)  (£>  Zj)  /x(tj)|a(tj),Z(tj)  (£  Iw  Zj)  /a(tj)|Z(tj)  (<*  |  Zj)  (2.29) 

since  it  features  all  of  the  variables  that  we  are  interested  in  estimating,  given  all  of 
the  available  measurements.  The  second  term  of  Equation  (2.29)  is 

K 

/a(ti)|z(ti)(c*|Zj)  =  J^pfc(fj)  6 (a  -  afc)  (2.30) 

k=l 

with  Pk(ti)  defined  by 


Pk(ti)  =  pr{a(fj)  =  afc|Z(fj)  =  Zj} 
where  a k  is  the  kth  system  mode  parameter  vector. 


(2.31) 
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Once  Equation  (2.29)  is  so  established,  we  can  write  (via  marginal  PDFs): 


/x(ti)|z(ti)(£|Zi)  =  /  /x(ti),a(ti)|z(ti)(^)  «lzi)  da  (2.32) 

J  — OO 

Then  applying  the  sifting  property  of  the  Dirac  delta  function,  £(•),  yields 

K 

fx(u) |z(ti)(£|Zi)  =  ^  fx(ti)\a(ti),z(ti){€\a-k,  Z j)  pr{a(^)  =  afc|Z(tj)  =  Z,}  (2.33) 

fc=i 

ffence,  the  development  of  the  single  Elter  as  a  lumped  expression  of  a  (truly  dis¬ 
tributed)  system  is  naturally  represented  by  a  filter  bank  via  the  total  probability 
theorem19. 

We  will  continue  to  consider  and  develop  these  ideas  in  the  sections  that  fol¬ 
low  in  terms  of  the  concept  of  the  uncertain  parameter  and  the  parameter  space 
discretization  process  —  a  process  which  can  be  viewed  using  the  total  probability 
theorem.  Finally,  the  state  and  parameter  estimates  generated  by  the  MMAE  are 
given  and  then  the  assumptions  underlying  the  (static)  multiple  model  methods  are 
reviewed.  Later  in  the  chapter,  in  Sections  2.4.7,  we  introduce  and  briefly  discuss 
several  dynamic  multiple  model  techniques. 

2.3.3. 1  Parameter  Vector.  The  first  step  in  building  a  filter  bank 
is  to  identify  the  parameter  vectors  which  we  use  to  represent  the  system  modes20. 
From  the  discussion  in  Section  2.3.1,  we  know  that  the  elements  of  the  matrices: 
<h(fj,  tj_i),  Bd(tj),  Gd(L),  H(tj),  Qd(ij),  R(L)  describe  the  structure  and/or  charac¬ 
terize  the  statistics  of  the  dynamics  and  measurement  models  given,  respectively, 
in  Equations  (2.6)  and  (2.7).  The  elements  of  these  matrices  can  be  functions  of 
a  set  of  quantities  that  are  called  the  parameters ;  each  scalar  parameter  can  affect 
one  or  more  elements  of  these  matrices.  Together  the  parameters,  equations,  and 

19Compare  this  observation  with  the  virtual  filter  bank  discussed  in  Section  2.5. 

20In  actuality,  the  parameter  vector  represents  the  portion  of  the  system  model  which  varies  and 
consequently  gives  rise  to  the  different  system  modes. 
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other  assumptions  and  comments  define  the  mathematical  system  model.  Our  goal 
is  to  estimate  a  small  subset  of  these  parameters  (which  are  assumed  to  be  constant 
over  time  for  a  static  filter  bank  or  allowed  to  vary  slowly21  with  time  for  a  dynamic 
or  moving  bank  of  filters)  at  each  point  in  time  tp  this  estimation  process  is  called 
parameter  identification22 . 

Specifically,  the  parameter  vector  a  (U)  represents  uncertainty  in  any  of  the 
elements  of  <&(U,U_i),  Bd(U),  Gd(U),  H(U),  Qd(U),  or  R(U).  It  is  important  to 
note  that  the  uncertainty  in  <E»(U, t^i),  Bd(U),  Gd(U),  or  Qd (U)  may  be  due  to 
uncertainty  in  the  continuous-time  dynamics  structure  F  (f).  Uncertainties  in  the 
plant  noise  distributor  G (t)  or  Gd(U)  are  treated  equivalently  as  uncertainties  in 
Q d(ij)  and  oftentimes,  we  roll  the  uncertainties  in  H(U)  into  either  $(U,U_!)  or 
Bd(U)  by  an  alternative  choice  of  state  variables,  since  we  often  cannot  isolate  both 
at  the  same  time. 

This  subset  of  uncertain  parameters  is  modeled  as  a  slowly  varying  discrete 
random  process23  and  is  denoted  by  a (U).  Oftentimes  the  choice  is  obvious,  but  when 
it  is  not,  an  empirical  study  is  conducted  on  the  entire  list  of  parameters  to  determine 
the  J  parameters  most  crucial  for  the  task  at  hand;  this  analysis  often  depends  on 
whether  we  are  most  interested  in  identifying  the  parameter  in  force,  improving  the 
state  estimate,  or  enhancing  control  action  [175].  The  random  parameter  vector 
(representing  the  system  mode  that  may  vary  with  time)  and  its  realization  are 

21Slowly  as  compared  to  the  dominant  time  constants  of  the  system  or  measurement  process. 

22For  a  comprehensive  explication  of  the  standard  methods  see,  for  example,  the  fine  texts  by 
Sorenson  [182],  Sage  and  Melsa  [167],  and  Ljung  [121] 

23Much  of  the  literature,  beginning  with  Magill  [125],  takes  the  approach  of  calling  the  param¬ 
eter  vector  a  deterministic  quantity  that  is  simply  a  collection  of  unknown  constants  and  uses  an 
analysis  of  the  residuals  to  determine  which  filter  or  linear  combination  of  filters  best  estimates  this 
parameter  value. 
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denoted,  respectively,  by 


a  (U)  = 


ai(ti) 

ai(ti) 

a  2(t*) 

and  a(tj 

«2  (ti) 

)  4 

a  jiti) 

aj(ti ) 

(2.34) 


Each  element  of  the  realization  is  dehned  on  a  subset  of  the  real  number  line 


dj  G  A j  C  M,  Vj  =  1,  2, . . . ,  J 

and  the  entire  parameter  space  is  denoted  by  a  product  set 

A  A  Ai  x  A2  x  •  •  •  x  Aj  c  MJ 


(2.35) 


(2.36) 


where  [154] 


Ai  x  A2  x  . . .  x  Aj  A  {(ai,a2, . .  .,aj)\a,j  G  Aj  Vj  =  1,2, ... ,  J}  (2.37) 

These  subsets,  A  j  for  every  j  =  1,  2, . . . ,  J,  on  the  real  line  may  be  discrete,  con¬ 
tinuous,  or  mixed.  Within  this  work,  we  will  sometimes  use  the  terms  “parameter 
vector”,  “mode”,  and  “model”  interchangeably  even  though  the  parameter  vector 
only  refers  to  part  of  the  model  used  to  represent  the  difference  between  the  system 
modes. 

An  example  will  help  to  clarify  this  notation.  Let  J  =  2  and  thus  a  = 
[  ai  09  ]T-  Then  let  Oi  G  Ax  =  [0,  oo)  represent  the  unknown  nonnegative  scalar 
multiplier  used  to  specify  the  dynamics  noise  strength,  Q  =  ail,  where  I  is  an  identity 
matrix  of  the  appropriate  size.  Next,  let  a2  G  A2  =  (0, 1}  be  an  important  parameter 
in  the  dynamics  matrix,  F.  that  is  either  zero  or  one.  If  we  were  to  conduct  an  exper- 
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iment  like  this,  we  would  be  performing  system  identification  in  an  uncertain  noise 
environment.  Finally,  we  see  that  parameter  space  A  =  Ai  x  A2  =  [0,  00)  x  {0, 1}  is 
indeed  a  subset  of  M2.  Since  the  MMAE  fundamentally  assumes  that  a  parameter 
may  only  assume  values  from  among  a  finite  set,  we  will  need  to  approximate  the 
nonnegative  real  line  [0,  00)  with  a  discrete  set  of  points  versus  the  continuous  set  of 
points  currently  assumed.  How  to  choose  the  “best”  set  of  points  is  still  an  active 
area  of  research. 

2. 3. 3. 2  Parameter  Space  Discretization.  The  earliest  attempt  to 
sample  or  discretize  the  admissible  set  of  parameter  values  was  accomplished  by 
Sengbush  and  Lainiotis  [174];  they  proposed  two  algorithms  for  a  binary  quantization 
of  the  admissible  set.  A  decade  later,  Lamb  and  Westphal  [110]  used  a  simplex 
method  of  nonlinear  programming  to  direct  the  discretization  process. 

The  process  of  “choosing”  K  points  in  the  parameter  space  is  often  called  dis¬ 
cretization]  the  collection  of  the  K  points  is  called  the  parameter  set24.  The  goal  of 
parameter  discretization  is  to  represent  the  parameter  space  accurately  with  a  small 
set  of  discrete  points  in  order  to  reduce  the  computational  burden  and  to  increase 
the  distinguishability  of  the  elemental  Liters  in  the  filter  bank.  The  success  of  the 
MMAE  depends  on  the  distinguishability  of  the  models  used  in  the  bank  of  elemen¬ 
tal  Liters.  To  determine  which  parameter  value  to  use,  there  must  be  appreciable 
differences  between  the  characteristics  of  the  residuals  for  the  “correct”  model  ver¬ 
sus  the  other,  mismatched,  Liters.  Additionally,  when  Kalman  Liters  are  used  in  the 
bank,  conservative  tuning  should  be  avoided  to  prevent  the  residuals  from  becoming 
too  close  together  and  affecting  the  discrimination  of  the  algorithm;  this  effect  will 
be  discussed  in  more  depth  later.  In  the  limit  as  the  residuals  become  indistinguish¬ 
able,  the  adaptation  process  is  totally  incapacitated.  For  fast  and  reliable  parameter 

24Some  authors  refer  to  this  entire  process  as  defining  the  model  set ,  i.e. ,  defining  which  models 
to  use  in  the  bank  of  filters,  see  for  example  [120,  117,  119].  Note  that  defining  the  model  set  is 
more  general  since  it  also  includes  filter  bank  composition. 
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identification,  assuming  one  of  the  hypothesized  filters  models  is  based  on  the  true 
parameter  value,  the  residuals  should  be  as  distinct  as  possible  [124,  130]. 

The  set  A  is  the  admissible  set  of  parameter  values,  called  points,  that  the 
parameter  vector  may  assume.  This  admissible  set  is  normally  a  subset  of  J- 
dimensional  Euclidean  space  and  is  commonly  called  the  parameter  space.  In  or¬ 
der  to  implement  the  multiple  model  algorithm,  the  designer  must  choose  (in  some 
intelligent  fashion)  a  subset  of  points  from  A  in  order  to  represent  the  parameter 
space  . 

Continuing  the  example  of  the  previous  section,  we  choose  a  maximum  prac¬ 
tical  value  for  the  set  Ai  to  be  10.  Thus  we  now  have26  the  closed  interval 
Ai  =  [0,10];  we  shall  discretize  it  into  the  set  {0,5,10}.  After  discretizing  Ai  we 
have27  A  =  {0,  5, 10}  x  {0, 1}  =  {(0,  0),  (5,  0),  (10, 0),  (0, 1),  (5, 1),  (10, 1)}.  We  now 
have  six  points  in  A  from  which  to  choose,  thus  the  number  of  elemental  filters,  K, 
is  equal  to  six.  Additionally,  we  shall  assume  that  all  six  of  these  points  represent 
legitimate  parameter  values  with  which  we  can  design  a  filter.  Since  our  discretiza¬ 
tion  of  Ai  was  completely  arbitrary,  it  is  possible  that  an  alternate  discretization  of 
the  half-line  Ai  =  [0,  oo)  would  yield  better  results  in  terms  of  improved  state  or 
parameter  estimation. 

The  simplest  (and  most  likely  the  least  effective  [177])  approach  to  discretizing 
the  parameter  space  is  to  divide  the  domain  uniformly  for  each  of  the  J  parameters 
in  the  parameter  vector  into  Nj  —  1  intervals,  where  Nj  can  be  different  for  each  of  the 
J  parameters,  and  then  design  a  filter  at  each  boundary  point  [53,  98].  (Note  that 
this  is  the  method  that  we  used  to  discretize  Ax  =  [0, 10]  in  the  example  a  couple 
of  paragraphs  back,  where  N\  =  3  gave  rise  to  two  intervals  and  the  set  {0,5, 10}.) 

25Magill  [125]  assumed  that  the  parameter  space  was  populated  with  a  finite  set  of  known  values. 

26Note  that  we  have  used  the  same  notation  for  the  set  Ai  both  before  and  after  discretization 
in  the  same  spirit  as  when  a  computer  program  assigns  a  new  value  to  an  existing  variable. 

2 'As  with  the  set  Ai,  we  have  redefined  the  product  space  A  after  discretization  process  to  be 

A. 
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A  slightly  more  effective  method  would  divide  the  domain  uniformly  for  each  of  the 
J  parameters  in  the  parameter  vector  into  Nj  intervals  and  then  design  a  filter  at 
the  center  of  each  interval  [177];  for  this  case,  we  would  get  Ai  =  {l|,5,8|}.  In 
either  case,  we  construct  NiN2  =  K  filters.  For  some  parameters,  it  might  make 
sense  to  space  the  intervals  logarithmically28  [81,  136].  In  theory  and  in  practice, 
the  parameter  space  does  not  have  to  be  convex,  thus  K  is  the  maximum  number  of 
distinct  filters  that  can  be  constructed  from  the  product  set.  In  other  words,  one  or 
more  of  the  defining  J-tuple  points  may,  in  fact,  be  invalid.  These  simple  methods 
are  accomplished  off-line  and  the  number  of  filters  is  held  constant. 

A  better  ad  hoc  sampling  scheme  would  include  some  measure  of  performance 
to  control  the  discretization  process  and  would  (most  likely)  result  in  a  nonuniformly 
sampled  space.  We  could  begin  by  designing  the  first  Kalman  filter  in  the  bank  at 
some  nominal  point.  Then,  while  monitoring  the  estimation  accuracy  (or  some 
function  of  the  residuals29  or  estimation  errors),  vary  one  parameter  in  one  direction 
at  a  time.  Choose  the  points  which  only  allow  the  accuracy  to  degrade  by  some  set 
amount  [113,  128,  114,  49,  50]. 

Recently,  Erickson  [49,  50]  discretized  his  parameter  set  by  monitoring  an 
information  distance  measure  [16]  as  he  varied  a  single  parameter.  This  so-called 
Bararn  distance  is  basically  the  likelihood  quotient  of  the  Gaussian  conditional  PDF, 
which  will  be  discussed  and  defined  in  Section  2. 3. 3. 3,  Equation  (2.44).  Since  he  had 
seven  measurements,  he  expected  the  true  value  of  the  likelihood  quotient  to  be 
seven  for  a  properly  tuned  Kalman  filter.  He  found  that,  by  choosing  his  parameter 
points  such  that  the  likelihood  quotient  increased  to  the  same  value,  which  in  this 

28This  simple  scheme  is  utilized  in  this  research  even  though  it  may  not  be  optimal  for  our 
problem. 

29The  residual,  which  was  defined  mathematically  in  Equation  (2.26),  is  simply  the  difference  of 
the  predicted  and  observed  measurements  and  thus  contains  information  on  how  well  a  filter  model 
matches  up  against  the  true  system.  An  appropriate  and  hence  very  common  function  of  the  mea¬ 
surement  residual  vector  is  the  likelihood  quotient  defined  in  Equation  (2.46):  r ]F (d)  A^1(ti)  r *,(£,)• 
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case  was  14,  gave  a  viable  parameter  variation  in  all  directions.  He  also  invoked  the 
idea  of  foveal  versus  peripheral  regions30  for  filter  bank  discretization. 

Sheldon  [175,  176,  177]  and  Lund  [123,  124]  have  both  contributed  to  un¬ 
derstanding  parameter  set  creation  via  the  discretization  process.  Sheldon’s  main 
contribution  was  an  optimal  discretization  procedure  that  allowed  the  designer  to 
focus  on  state  or  parameter  estimation  or  control  regulation  using  a  user  specified 
cost  functional.  A  weighting  matrix  allows  the  designer  to  tailor  the  weight  placed 
on  each  state,  parameter,  or  control  action.  Lund  proposed  an  online  algorithm  that 
would  maintain  the  distinguishability  (by  lowering  the  dynamics  noise  strength  Q  or 
Kalman  filter  gain  K)  of  the  elemental  Liters  as  environment  changes  necessitated 
modifying  the  existing  filter  bank.  Lund’s  work  was  extended  for  the  discrete-time 
case  by  Miller  [149]  and  Vasquez  [198]. 

Vasquez  [198]  and  Miller  [149]  modified  Sheldon’s  “static”  algorithm  to  provide 
on-line  discretization  of  an  adaptive  MMAE  bank;  we  shall  call  this  dynamic  dis¬ 
cretization  as  the  filters  are  allowed  to  be  based  upon  piecewise  constant  parameter 
values  rather  then  constant  ones,  to  respond  better  to  a  nonstationary  environment. 
Previous  approaches  required  that  the  moving-bank  algorithm  store  predetermined 
discretizations  [130,  132,  69,  68,  172],  This  dynamic  algorithm  can  run  in  real-time 
and  the  optimization  assumes  a  finite  horizon  (rather  than  an  infinite  horizon  and 
steady  state  values)  in  the  computation  of  the  discrete  parameter  values.  Discretiza¬ 
tions  do  not  have  to  be  predetermined  although  Liters  may  be  pre-computed  and 
stored  for  speed  enhancement. 

A  few  observations  regarding  the  discretization  process: 

1.  Since  we  are  unlikely  to  have  a  model  that  exactly  matches  the  real  world  in 

the  bank,  we  can  still  attain  good  performance  if  we  interpolate  between  the 

30 An  analogy  to  the  foveal  and  peripheral  regions  of  human  eyesight,  where  the  foveal  high- 
resolution  vision  is  concentrated  at  the  center  of  the  held  of  view  and  peripheral  low-resolution 
vision  is  more  sensitive  to  light  changes  and  covers  the  remaining  portion  of  the  held  of  view. 


2-25 


0,2 


o 


o 


★ 


o 


o 


CL\ 


Figure  2.2  Surround  the  true  parameter.  Legend:  O  elemental  filter;  ★  true 
parameter  within  the  filter  bank;  ■  true  parameter  outside  of  the  filter  bank. 

existing  Liters  —  this  results  in  a  blended  estimate  which  will  be  discussed  in 
Section  2.3.4  [198].  The  elemental  Liters  should  surround31  the  true  parameter 
[198],  as  seen  in  Figure  2.2  when  we  have  a  two-dimensional  parameter  set, 
i.e.,  a  set  of  ordered  pairs  (cii,a2).  The  open  circles  represent  the  location  of 
elemental  Liters  in  the  bank.  The  black  star  is  the  true  parameter  at  operating 
point  within  the  Liter  bank,  while  the  black  squares  are  at  points  that  must  be 
extrapolated  since  they  are  not  surrounded  by  the  bank  of  elemental  Liters. 

2.  The  coarser  the  discretization,  the  farther  (on  average)  the  true  parameter 
is  from  the  assumed  point;  in  other  words,  discretization  directly  affects  how 

31The  parameter  estimate  a  =  [ai  . . .  aj]T  is  surrounded  whenever  <  a  <  a„,  i.e.,  whenever 
Ufi,j  <  dj  <  auj  for  some  /x,  v  €  {1,2,...,  K }  for  all  J  parameters. 
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Figure  2.3  Coarse  and  fine  discretization  of  filter  bank.  Legend:  O  elemental 
filters  for  a  coarse  discretization;  o  elemental  filters  for  a  fine  discretization;  ★  true 
parameter  within  the  filter  bank. 

close32  the  parameter  estimate  may  be  to  the  true  parameter  and  how  well 
(or  how  closely)  the  filter  bank  surrounds  the  true  parameter  [198].  Figure 
2.3  shows  a  coarse  discretization  of  the  filter  bank  using  large  circles  and  fine 
discretization  of  the  filter  bank  using  small  circles  for  a  two-dimensional  pa¬ 
rameter  set. 

3.  The  measurement  precision  (R)  inherently  places  a  lower  bound  on  the  prac¬ 
tical  level  of  discretization  attainable  since  noisy  measurements  (a  “large1'  R) 
will  mask  a  fine  discretization.  When  multiple  Liters  roughly  match  the  real 

32Closeness  can  be  determined  in  several  ways.  Here  it  refers  to  “distance”  between  the  true 
parameter  and  the  assumed  parameter  used  to  construct  an  elemental  filter.  In  a  later  section, 
closeness  is  determined  by  how  accurately  the  predicted  measurement  matches  the  observed  mea¬ 
surement;  see  Section  2.3.4. 


2-27 


world,  then  the  probability  flow  between  them  can  become  unstable  when  these 
filters  are  essentially  indistinguishable  from  one  another33. 

4.  It  is  also  possible  that  some  intermediate  level  of  discretization  can  result  in 
a  biased  estimate  since  the  most  appropriate  filter  may  be  located  “farther” 
from  the  true  value  and  the  observed  parameter  value  could  lie  between  the 
assumed  parameter  values  of  a  pair  of  filters;  however,  the  estimate  is  thus 
biased  towards  a  filter  farther  from  the  true  parameter  value  [88]. 

2. 3. 3. 3  Elemental  Filters.  Each  elemental  filter  in  the  bank  (shown 
in  Figure  2.1)  represents  a  different  system  mode  and  is  thus  based  upon  a  different 
hypothesis  for  the  parameter  values,  e.g.,  the  kth  elemental  filter  design  model  is 
constructed  assuming  that  a (ti)  =  afc.  The  discrete-time  model  equations  for  the 
kth  elemental  filter  are: 

X k{ti )  ti— i)  i)  B(jfc(tj_i)  u(fj_i )  T  Wjfc (fj_i)  (2.38) 

z(ti)  =  H  k(ti)xk(ti) +vk(ti)  (2.39) 

where  the  properties  of  <J>fe(^,^- 1),  B dk(U),  H k(ti),  Qd k(U),  and  R k{U)  were  dis¬ 
cussed  in  Section  2.3.1.  Note  that  most  of  the  research  on  MMAE  has  employed  the 
following  time-invariant  system  model  equations: 

Xfc(ti)  =  &k(U  ~  ti-i)  xk(ti- 1)  +  Bdfc  u(ti_i)  +  wdfc(^_1)  (2.40) 

z(ti)  =  H  fcXfc(ti) +vfc(ti)  (2.41) 

where  the  time-invariance  gives  B dk(U)  =  Bdfc  and  Hk(ti)  =  fR,  and  for  uniform 
spacing  between  time  samples,  At  =  ti  —  f*_i  for  all  i,  the  state  transition  matrix 
is  independent  of  time:  <&*,(£;, f*_i)  =  <1 ?k(U  —  ti-i)  =  ^k(At),  and  for  stationary 
noises  processes:  Q dk(U)  =  Qd k  and  R k(U)  =  Ra,-  for  all  time  ti.  These  assumptions 

33Note  that  distinguishability  of  filters  is  a  function  of  the  filter  measurement  residuals  which 
is  the  difference  between  the  predicted  and  observed  measurements  as  defined  in  Equation  (2.26). 
Thus  we  note  that  parameter  set  creation  by  discretization  and  filter  bank  composition  are  closely 
tied  together. 
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allow  the  construction  of  a  steady-state  Kalman  filter  model.  For  the  purposes  of 
this  research,  we  will  not  limit  ourselves  to  steady  state  filtering. 


The  appropriateness  or  validity  of  each  hypothesis,  a (£*)  =  a*,,  is  readily  ob¬ 
tained  through  an  analysis  of  the  filter  residuals  —  the  difference  between  the  ob¬ 
served  measurement  and  the  predicted  measurement,  r k(ti)  =  z,  —  H(fj)  x(f“)  [129]. 
This  “correctness”  information  is  encoded  in  the  hypothesis  conditional  probability 
which  is  defined  as  the  probability,  pr{-},  that  a (ti)  assumes  the  value  a& 
(for  k  =  1,2, .. .  ,K),  conditioned  on  the  observed  measurement  history  to  time  t; 
[130,  132]: 


such  that 


Pkiti)  =  pr{a(fj)  =  afc  Z (U)  =  Z *} 

(2.31) 

K 

Pk(ti )  >  0  for  all  k  and  Pk{ti )  =  1 

(2.42) 

k= 1 


and  the  mode  conditional  probability  density  function  is  actually  a  conditional  prob¬ 
ability  mass  function  for  a  discrete  random  parameter  vector  [133,  178]: 


K 

|  ^i)  ^  ^  Pk  jt^Sjoc  *bc) 

fc=l 


(2.30) 


If  we  hrst  assume  that  the  prior  probabilities  Pk(to )  are  known  (or  well  mod¬ 
eled),  for  example,  Pk(t0 )  =  1/K  for  k  =  1  then  the  dehnition  for  the 

hypothesis  conditional  probability  given  in  Equation  (2.31),  can  be  expressed  as  the 
following  recursion  [125,  107,  11,  108,  130,  132,  14] 


Pkiti) 


/z(tj)|a(tj),Z(tj_i)(zi|afc)  Zj_i)  Pkiti-i) 
Xp=l  f z(t*)|a(ti),Z(tj_i) (zi |aj j  Zj_i)  Pj(ti- 1) 


(2.43) 
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where  the  conditional  PDF34: 


/z(tj)|a(tj),Z(tj_i)  (zi  exp  |  2 


(2.44) 


is  a  zero-mean  Gaussian  with  covariance  A *.(£,),  scale  factor 


A(ti)  = 


(27r)-/2|Afc(ti)|V2 


(2.45) 


and  measurement  dimension  m.  The  likelihood  quotient,  which  is  a  measure  of  the 
“correctness”  of  the  parameter  values  for  this  particular  model  [130],  is 


Lkiti)  =  rl(ti)  Afe1(ti)  rk(ti) 


(2.46) 


where  r k(U)  and  A k(U)  are  the  residual  and  associated  residual  covariance  calculated 
by  the  kth  Kalman  filter  as  in  Equations  (2.26)  and  (2.23),  respectively.  If  we 
denote  the  true  residual  covariance  as  Atrue(f?;),  then  A k(ti)  =  Atrue(tj)  whenever 
the  kth  elemental  filter  properly  matches  the  real  world  condition.  Since  the  scaling 
factor,  (3k(ti ),  only  ensures  that  the  function  always  integrates  to  unity,  the  important 
(“shape”)  information  in  this  PDF  is  encoded  in  likelihood  quotient,  Lfc(tj)  —  the 
weighted  square  of  the  residuals. 

It  has  been  shown  [94,  129]  that  the  sequence  of  residuals  (rfc(fj)}  resulting 
from  linear  filtering  forms  a  zero-mean  white  Gaussian  sequence  with  known  residual 
covariance  A k{U)-  Thus,  if  a  filter  model  matches  the  “true”  system,  then  the 
residual  r*,(tj)  should  be  a  zero-mean  white  Gaussian  process  with  known  residual 
covariance  A  k(ti). 

Since  we  did  not  derive  Equation  (2.43),  it  might  not  be  readily  apparent  that 
the  denominator  is  simply  the  PDF  for  the  current  measurement  conditioned  on  the 

34Note  that  conditional  PDF  /z(ti)|a(ti),z(ti_i)(Cilafc)  i),  where  C i  is  a  dummy  variable  for  the 
stochastic  measurement  process,  becomes  a  real  number  fz(ti)\a(ti)jz(ti-tt)(zi\ak>  Z*_i)  when  evalu¬ 
ated  with  the  measurement  =  z,;  at  time  t,;. 
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past  measurements  [210],  i.e. , 


/z(ii)|Z(ti_i)(zi|Zj_i)  /  /z(ti)|a(tj),Z(fi_i)(CilQ:)  ^i-l) /a(ti)|Z(ti_i)(Q:|Zi_i)  do:  (2.47) 

J  A 
K 

=  ^  ]  /z(tj)|a(tp,Z(tj-i)(zt|aj)  Zj- 1)  Pj(tj-i)  (2.48) 

J= 1 

where  the  second  equality  is  due  to  the  sifting  property  of  the  Dirac  delta,  as  in 
Equation  (2.30).  This  observation  allows  us  to  interpret  the  discretization  of  the 
parameter  space  into  a  discrete  set  of  points,  each  representing  a  system  mode  [210]. 
Recall  that  the  total  probability  theorem  requires  that  an  event,  such  as  the  current 
measurement,  be  partitioned  into  a  set  of  disjoint  or  mutually  exclusive  partitions 
such  that  the  union  of  these  partitions  equals  the  event  in  question  [159].  This  is  in 
agreement  with  the  discretization  guidance  given  in  Section  2. 3. 3. 2.  Thus,  proper 
sampling  of  the  parameter  space  is  analogous  to  proper  partitioning  of  the  event  space 
as  required  by  the  total  probability  theorem.  In  other  words,  the  representation  of 
the  parameter  space  by  a  discrete  set  of  points  is  essentially  an  insightful  use  of  the 
total  probability  theorem  [210]. 

2.3.4  State  and  Parameter  Estimates.  The  MMAE  estimation  technique 
uses  the  information  from  all  of  the  Kalman  filter  residuals  to  estimate  the  “true” 
parameter  vector  in  effect  and  thus  determine  the  true  system  mode.  This  technique 
is  optimal  when  there  is  a  unique  filter  paired  to  each  of  a  finite  number  of  system 
modes.  We  shall  populate  the  filter  bank  with  K  filters;  each  based  on  a  unique 
J-dimensional  parameter  vector. 

From  the  Bayesian  point  of  view,  the  MMAE  framework  can  be  used  to  com¬ 
pute  a  state  (or  parameter)  estimate  that  is  characterized  by  minimizing  the  MSE 
between  the  predicted  and  measured  state  estimates;  this  is  most  often  called  a  min¬ 
imum  mean-squared  error  (MMSE)  estimate  and  is  the  conditional  mean.  An  alter¬ 
nate  approach  is  called  the  MAP  estimate;  its  estimate  corresponds  to  the  largest 
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hypothesis  conditional  probability  and  is  dubbed  the  “closest”  model  to  the  real 
world;  the  corresponding  estimate  is  the  conditional  mode.  We  identify  the  Bayesian 
estimate  as  the  standard  MMAE  estimate  and  write  it  as  [130,  132,  14]: 

K 

xmmae(^)  =  E{x(ti)\Z(ti)  =  Z ,;}  =  ^xfc(f+)  pk(ti)  (2.49) 

k= 1 

where  xy  (f,+ )  is  the  state  estimate  generated  by  the  kth  Kalman  hlter  based  on  the 
assumption  that  the  parameter  vector  a (fj)  =  a^.  The  conditional  covariance  of  x(£;) 
is  [130,  14] 

P  MMAe(^/"  ) 

=  E  {[x(tj)  -  xMmae(4+)][x(^)  -  xMMAE(^+)]T|Z(ti)  =  Z i}  (2.50) 

I< 

=  ^2  _  X-MMAE )]  [Xfc (t~i  )  -  xMmae(^+)]T}M^)  (2.51) 

fc=l 

where  Pfc(f)1")  is  the  state  error  covariance  computed  by  the  A;th  Kalman  filter. 
Additionally,  the  PDF  of  the  state  of  the  system,  given  the  measurement  history 
Z (fj)  =  Z j,  is  given  by  a  weighted  sum  of  Gaussian  PDFs  known  as  a  Gaussian 
mixture  [14] 


I\ 

/x(ti)|z(ti)(xi|Z<)  =  ^2fJl[x(ti)]itk(tt),Pk(tt)]  pk(U )  (2.52) 

k=  1 

where  91[x(f,:);  xfc(t+),  PA.(t+)]  is  the  Gaussian  (normal)  PDF  of  x(f)  for  the  kth 
elemental  hlter.  The  parameter  estimate  is  given  by: 

K 

aMMAE  (tl)  =  E{  a(ti)|Z(ti)  =  Zi}  =  kpk(U)  (2.53) 

k=  1 

with  conditional  covariance  of  a (t*)  [129]: 


2-32 


P  3,MMAe(^  ) 

=  E  {[a(ti)  —  aMMAE(^i1")] [a(U)  ^mmae(^ )] 1  |Z(ij)  =  Z*}  (2-54) 

K 

=  ^[afc  —  %mae(^)]K  —  aMMAE(X+)]T  Pk(ti)  (2.55) 

k=l 

Using  the  hypothesis  conditional  probabilities,  we  can  assign  a  ranking  of  close¬ 
ness  of  the  assumed  parameter  to  the  true  parameter  value.  Hence  the  MAP-MMAE 
state  and  parameter  estimates  produced  by  the  Kalman  Elter  with  the  largest  hy¬ 
pothesis  conditional  probability  are  given  by 


^MAP-MMAE )  —  Xfc*  (tj )  (2.56) 

and 

^map-mmae  (tf)  =  afc*  (2-57) 

where 

k*  =  arg  |max[pi(U),p2(U),  •  •  •  ,_Px(U)]}  (2.58) 

The  Bayesian  estimates  provide  smoother  transitions  as  the  situation  changes, 
compared  to  the  MAP  estimates  which  may  jump  from  filter  to  filter.  With  a 
properly  designed  bank  of  filters,  i.e.,  one  which  has  been  properly,  if  not  optimally, 
discretized,  the  difference  between  the  Bayesian  and  MAP  estimates  is  small  [68,  139]. 

2-4  Practical  Performance  Enhancements  for  Multiple  Model  Adaptive  Estimation 

Depending  on  the  specific  problem  at  hand,  there  are  numerous  ad  hoc  ad¬ 
justments  that  can  be  made  to  the  standard  MMAE  structure  to  increase  either 
parameter  or  state  estimation  performance.  Several  researchers  have  compiled  lists 
of  some  of  these  techniques.  Vasquez  [198]  compiled  his  list  to  improve  the  composi¬ 
tion  or  positioning  of  the  bank  in  moving-bank  MMAE.  Maybeck  and  Hanlon  [135] 
assembled  an  assortment  of  methods  to  improve  sensor/actuator  failure  detection 
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and  identification.  Several  other  improvement  techniques  gathered  from  the  litera¬ 
ture  were  not  previously  considered  under  the  title  “enhancement”;  however,  they 
will  be  labeled  as  such  here. 

2-4-1  Kalman  Filter  Tuning.  Since  no  model  is  exact,  the  art  of  filter 
tuning  is  the  first  tool  to  which  we  turn  after  designing  a  filter.  To  tune  an  elemental 
Kalman  filter  we  adjust  its  process  noise  covariance  Qdk  and/or  measurement  noise 
covariance  R*,  using  an  ad  hoc  trial-and-error  process  [129,  133,  14].  Even  though 
this  technique  is  not  analytic,  we  can  use  physical  insights  to  help  us  determine  the 
order  of  magnitude  of  the  noise  covariances.  Specifically,  one  seeks  tuning  conditions 
where  Lk(U )  =  rj (ti)  A/1(t*)  Tk(U)  is:  (1)  approximately  equal  to  the  number  of 
measurements  m  when  the  hypothesized  parameter  value  a*,  is  a  good  match  to  the 
true  parameter  value  and  (2)  significantly  larger  (or  smaller)  than  m  when  the  model 
is  a  poor  match.  Note  that  the  likelihood  quotient  is  directly  affected  by  the  tuning 
of  R/,.  since  A*,  =  H/,  P/  H /  +  R^  and  indirectly  influenced  by  Qdfc  through  the 
calculation  of  PjT. 

While  increasing  Qd  can  improve  the  responsiveness  or  performance  of  an  in¬ 
dividual  filter  by  masking  assumed  model  inadequacies,  it  may  result  in  slower  prob¬ 
ability  flow  between  filters  because  as  Qd  increases,  distinguishability  among  the 
filters  decreases.  Too  high  a  value  of  R^  can  deteriorate  the  detection  capability  of 
the  algorithm  and  will  often  result  in  detection  delays  [130,  132,  47,  46,  71]. 

2-4-2  Harmonically  Balanced  Kalman  Filters.  Muravez  [153]  designed  a 
harmonically  balanced  Kalman  filter  bank  to  track  a  target  exhibiting  a  maneuver 
well  modeled  by  second-order  periodically  correlated  acceleration  (PCA)  shaping 
filter  in  the  presence  of  uncertain  measurement  noise  covariance.  The  elemental 
filters  were  designed  to  cover  the  entire  frequency  range  of  expected  power  spectral 
densities  with  constant  bandwidth  filters  overlapping  at  the  half-power  point.  This 
method  as  originally  proposed  requires  a  large  bank  of  filters  and  has  only  been 
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applied  to  tracking  in  a  single  dimension  thus  far,  i.e.,  scalar  state  estimation  with 
scalar  measurements.  However,  it  is  possible  to  use  a  smaller  number  of  filters  of  the 
same  PCA  form  with  good  success  [133,  90,  112].  Although  this  technique  is  novel, 
it  appears  to  suffer  from  having  poor  distinguishability  between  the  large  number  of 
filters  required  to  implement  this  method  “properly”  as  given  in  his  thesis. 

2-4-3  Scalar  Residual  Monitoring.  One  way  to  reduce  sensor  failure  iden¬ 
tification  ambiguities  is  through  a  technique  called  scalar  residual  monitoring.  The 
scalar  likelihood  quotient  associated  with  the  jth  scalar  residual  is  defined  as  [139]: 

Lkj  ( U )  =  r2kj  ( U )  Ajg  (U)  (2.59) 

which  is  simply  the  .7.7 th  term  of  the  likelihood  quotient35  when  written  in  summation 
notation: 

j  J 

Lk{ti )  =  ^2^2rkfl(ti)rkv(ti)Aj;^(ti)  (2.60) 

fl=  1  V=1 

J  J 

=  rfj(ti)A^.(ti)  +  K'Jti)  (2-61) 

H=1  v=\ 
except  fi=is=j 


With  respect  to  sensor  failure  detection  and  identification,  the  predominant 
indicator  of  a  failure  should  be  a  large  value  for  jth  scalar  residual  Lkj(ti )  in  every 
elemental  filter  except  for  the  one  designed  to  detect  this  particular  failure36.  This 
makes  sense  because  this  term  only  contains  information  regarding  a  particular  hy¬ 
pothesis  about  a  particular  failure  [139].  As  with  many  useful  ad  hoc  techniques, 
a  threshold  must  be  specified  to  compare  to  the  scalar  likelihood  quotient  given  by 


35The  likelihood  quotient  appears  in  the  exponential  portion  of  the  hypothesis  conditional  PDF 
defined  in  Equation  (2.44) 

36Sensor  failures  are  often  modeled  by  zeroing  out  the  row  of  the  output  distributor  matrix 
H  corresponding  to  the  failed  device  and  single  actuator  failures  are  modeled  by  zeroing  out  the 
appropriate  column  of  the  input  distributor  matrix  B. 
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Equation  (2.59).  This  technique  may  be  used  to  augment  other  methods  to  identify 
sensor  failures  or  it  may  be  the  method  to  identify  them. 

2-4-4  /3  Dominance  Compensation.  The  group  led  by  Athans  [10]  discov¬ 
ered  that  the  u/3  dominance”  problem  hampered  flight  condition  estimation.  Suizu 
[191]  tackled  this  problem  that  occurs  whenever  the  elemental  filter  likelihood  quo¬ 
tients  are  all  approximately  equal,  i.e., 


Li{U)  «  L2(ti)  «  ■  ■  ■  «  LK(ti),  (2.62) 

where  L^iti)  =  r)F(tj)  A^'1(fi)  rk(U),  then  the  hypothesis  conditional  PDF  de¬ 
fined  in  Equation  (2.44)  is  dominated  by  the  Gaussian  PDF  scale  factor 
Pk{ti)  =  [(27r)m/2|Afc(tj)|1/2]-1;  hence  the  term  “f3  dominance”.  Thus  the  hy¬ 

pothesis  conditional  probabilities  are  inversely  related  to  the  determinant  of  the 
filter-computed  residual  covariance  |  (t*)  | .  Hence  the  model  receiving  the  largest 

probability  is  the  one  that  has  the  smallest  filter-computed  residual  covariance  de¬ 
terminant  |Afc(fj)|.  This  is  totally  useless. ..and  worse,  it  still  gives  ns  answers,  albeit 
incorrect  ones.  For  example,  a  typical  representation  of  a  sensor  failure  is  to  zero  out 
the  row  of  Hfc(ij)  corresponding  to  the  failed  sensor.  All  other  things  being  equal, 
the  filters  designed  for  this  type  of  failure  will  tend  to  have  smaller  |A^(tj)|  values, 
thus  an  MMAE  devised  for  sensor  failure  detection  will  be  prone  to  false  alarms  on 
sensors. 

Two  simple  methods  have  been  implemented  to  remove  this  (3  dominance  effect 
[132,  139,  70,  145,  135,  147,  187,  186].  One  method  uses  scalar  residual  monitoring 
and  the  other  simply  removes  the  /3k  term  from  the  PDF  equation.  Thus  Equation 
(2.44)  becomes: 


/z(ti)|a(ti),z(ti_1)(zi|afc,  Z<_i)  =  exp  {  — |r^(^)  Ak\u)  rfc(f*)}  (2.63) 
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While  Equation  (2.63)  is  not  strictly  a  PDF  (because  the  area  under  the  function 
is  not  one),  the  hypothesis  conditional  probabilities  defined  in  Equation  (2.43)  still 
sum  to  one  because  of  the  scaling  effect  of  the  denominator. 

2-4-5  Scalar  Penalty  Modification.  Another  quantity  appearing  in  the 
definition  for  the  conditional  PDF  in  Equation  (2.44)  that  has  been  adjusted  is  the  | 
sitting  out  in  front  of  the  likelihood  quotient  Lkitfij  [70,  135].  Modifying  this  number 
has  the  same  effect  as  scaling  the  filter-computed  residual  covariance  inverse  A-1 
by  some  scalar  “penalty” ;  with  this  modification,  we  are  essentially  admitting  that 
our  PDF  is  not  exactly  a  Gaussian  PDF.  By  increasing  the  magnitude,  we  can  raise 
sensitivity  to  large  residuals,  and  we  can  decrease  the  probability-convergence  time 
for  the  MMAE.  This  technique  drives  the  probabilities  associated  with  large  residuals 
to  zero  faster.  Ideally,  this  should  result  in  a  faster  convergence  of  the  conditional 
probabilities;  however,  increasing  the  scalar  penalty  also  increases  fluctuations  in  the 
probabilities  and  thus  results  in  an  increased  false  alarm  rate  [70,  135]. 

2-4-6  Lower  Bounding  Conditional  Probabilities.  Placing  a  lower  bound 
on  each  of  the  K  hypothesis  conditional  probabilities,  {pk},  has  been  used  to  pre¬ 
vent  filter  lock-out  [10].  While  small  probabilities  result  in  excessive  delays  for 
actuator/sensor  failure  identification,  true  filter  lock-out  precludes  the  identifica¬ 
tion  of  the  hypotheses  associated  with  the  probability  that  has  been  set  to  zero 
[130,  132,  139,  69,  70,  145,  68,  47,  135,  147,  187,  46,  186];  likewise  for  other  appli¬ 
cations  that  feature  abrupt  changes  in  the  dynamics  of  the  system  such  as  carrier 
phase  ambiguity  resolution  [80,  79]  and  detection  of  incidents  on  freeways  [212], 

Filter  lock-out  occurs  when  a  hypothesis  conditional  probability  for  an  elemen¬ 
tal  filter  becomes  zero.  By  inspection  of  Equation  (2.43)  one  can  easily  see  that  once 
a  probability  becomes  zero  at  time  £j,  then  it  will  remain  zero  for  all  time  t  >  ti-  This 
is  equivalent  to  removing  the  filter  from  the  filter  bank  altogether.  Consequently,  it  is 
impossible  to  identify  a  failure  associated  with  that  particular  filter  once  it  has  been 
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“removed.”  But  with  an  artificial  lower  bound  in  place,  a  filter  cannot  be  totally 
locked  out  and  thus  it  may  recover  within  a  “few”  iterations  and  properly  declare 
the  failure.  The  lower  bound  is  an  empirically  determined  value  that  is  greater  than 
zero  and  for  practical  purposes,  much  less  than  1/K.  Note  that  if  the  lower  bound 
were  set  at  1/K,  then  all  of  the  elemental  filters  would  receive  the  same  probability 
and  hence  completely  incapacitate  the  estimator.  The  larger  the  lower  bound  is,  the 
more  agile  the  probability  flow  is.  When  a  new  probability  is  calculated  that  would 
otherwise  be  less  than  the  lower  bound,  it  is  set  equal  to  the  lower  bound  and  the 
entire  set  of  probabilities  are  re-scaled  so  that  they  sum  to  one. 

One  drawback  to  this  method  is  that,  when  the  blended  state  and  parameter 
estimates  are  calculated,  as  shown  in  Section  2.3.4,  they  give  more  weight  to  the 
elemental  filter  estimates  which  have  had  their  hypothesis  conditional  probabilities 
increased  by  lower  bounding.  This  can  be  fixed  by  adding  logic  to  the  estimate  calcu¬ 
lation  that  simply  excludes  those  filters  that  have  been  kept  active  in  the  filter  bank 
via  lower  bounding  of  computed  pk  values,  if  it  becomes  problematic.  Alternatively, 
lower  bounding  does  not  bias  the  MAP  estimates,  provided  the  lower  bound  is  much 
smaller  than  1/K  for  a  bank  of  K  elemental  filters. 

2.4.7  Markov  Process  Modeling  of  Hypothesis  Conditional  Probabilities. 
Ackerson  and  Fu  [2]  introduced  the  concept  of  modeling  the  transition  from  one 
system  mode,  represented  by  parameter  vector  a(ti_1),  to  a  new  system  mode,  a (tf), 
as  a  Markov  process3'.  These  transitions  represent  abrupt  changes  in  the  dynamics 
of  the  system  and  thus  necessitate  switching  from  one  elemental  filter  to  another 
based  on  a  different  assumed  parameter  vector.  Under  this  concept,  the  hypothesis 
conditional  probabilities  are  propagated  via  the  Markov  process  as  developed  in 
references  [2,  130,  197,  22,  23,  14].  The  probability  that  a  Markov  system  will 

37The  Markov  property  allows  the  conditional  PDF  for  the  current  parameter  value  to  depend 
not  on  the  entire  time  history  of  parameter  values,  but  on  just  the  previous  parameter  value,  i.e. , 
/[a(ti)|a(tj-i),a(tj_2), . .  •  ,a(fi)]  =  /[a(i») |a(i*_i)] . 
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transition  from  mode  an  to  mode  am  at  time  t,t,  is  given  by 

Tmn(ti,ti- 1)  =  pr{a(ti)  =  am|a(U-i)  =  an}  (2.64) 

such  that  J^m=i  TmniUi  U-i)  =  L  Hence  the  sequence  of  modes:  a(f0), 

a(tj)  forms  a  Markov  chain.  The  hypothesis  conditional  probability  vector  at  time 

U  is 

p(U)  =  T (ti,  ti- 1)  p(U-i)  (2.65) 

where  the  elements,  Tmn(ti}  U-i)5  of  the  K  x  K  transition  probability  matrix, 
are  given  in  Equation  (2.64)  and  the  hypothesis  conditional  probability 

vector 

Pi(U) 

P  (ti)  =  (2.66) 

Pk(U) 

is  composed  of  elements:  Pk(U)  for  k  =  1,2,  previously  given  in  Equation 

(2.43).  The  difficult  part  of  utilizing  this  method  is  to  compute  the  elements  of 
matrix  T(tj,fj_i)  in  a  meaningful  manner;  Sullivan  and  Woodall  [192]  have  pro¬ 
posed  a  method  for  “estimating”  a  Markov  state  transition  matrix.  Recently,  Jilkov 
and  Li  [92]  have  proposed  four  algorithms  to  address  the  case  in  which  the  transition 
probability  matrix  is  assumed  to  be  time-invariant  and  random.  When  T  is  the  iden¬ 
tity  matrix,  i.e.,  when  the  transition  probabilities,  Tmn  =  Smn,  then  the  “dynamic” 
multiple  model  method  becomes  a  “static”  multiple  model  technique. 

The  Markov  chain  assumption  obviates  the  need  to  employ  a  lower  bounding 
technique  for  the  hypothesis  conditional  probabilities,  since  no  matter  how  small  the 
hypothesis  conditional  probability,  the  mode  can  still  “jump”  from  a  high  probability 
to  a  low  probability  mode  based  on  the  transition  probability.  Unfortunately,  once 
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we  assume  that  the  mode  transitions  form  a  Markov  chain,  the  resulting  algorithm 
requires  an  ever  growing  amount  of  memory,  hence  an  optimal  algorithm  is  generally 
not  a  feasible  option  for  most  applications.  However,  several  suboptimal  algorithms 
have  been  developed  for  this  Markov-switching  concept38:  the  generalized  pseudo 
Bayesian  (GPB)  [197,  14]  and  the  interacting  multiple  model  (IMM)  [22,  23,  14]  are 
two  useful  suboptimal  methods. 

An  IMM  estimator  is  an  extension  of  MMAE  with  Markovian  switching  that 
intermixes  the  state  estimates  from  time  t,t  to  time  1  in  order  to  approximate  the 
optimal  algorithm  closely  while  realizing  a  huge  computational  cost  savings  as  com¬ 
pared  to  the  optimal  algorithm  [22,  23].  Thus,  the  algorithm  trades  state  estimation 
accuracy  for  computational  savings. 

2-4-8  Hypothesis  Swapping.  Similar  to  the  hidden  Markov  modeling  of  the 
mode  transitions  discussed  in  the  previous  section,  “hypothesis  swapping”  involves 
using  additional  knowledge  about  how  a  system  operates  to  help  estimate  the  current 
mode  or  parameter.  Hoffman  [84]  used  the  fact  that  the  T-wave  almost  always  follows 
a  ventricular  depolarization  and  contraction  represented  by  the  QRS  complex  in  an 
electrocardiogram  (EGG)  signal.  Additionally,  the  T-wave  is  followed  by  a  variable 
length  “rest”  period  which  is  followed  by  the  P-wave.  For  our  purposes  here,  this 
knowledge  can  also  be  thought  of  as  another  form  of  moving-bank  MMAE  since  only 
a  subset  of  the  entire  bank  of  filters  is  used  at  any  one  time  —  its  composition  is 
modified  as  necessary  as  determined  by  logic  rules  using  information  coded  in  the 
measurement  residuals  and  other  a  priori  information;  see  Section  2.6. 

2-4-9  Probability  Smoothing.  Immediately  following  a  change  in  the  system 
operating  mode,  the  probabilities  undergo  a  transition  period  before  converging  to 
the  “correct”  solution.  Probability  smoothing  is  used  to  minimize  the  momentary 

38See  the  brief  survey  paper  by  Tugnait,  reference  [197],  for  two  more  types  of  suboptimal 
algorithms. 
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false  alarms  associated  with  these  transients  [70,  145,  135,  147,  187,  186].  The 
probabilities  are  smoothed  over  a  moving  window,  i.e.,  averaged  over  a  number  of 
data  samples.  The  size  of  the  window  is  chosen  empirically:  a  large  window  induces 
a  longer  delay  and  a  small  window  allows  more  false  alarms. 

2-4-1 0  Increased  Residual  Propagation.  Another  method  used  to  help  speed 
convergence  to  the  best  model  by  skipping  a  few  measurement  update  cycles  while 
continuing  to  propagate  the  Kalman  filter  state  estimates.  While  still  monitoring  the 
filters  residuals,  they  are  allowed  to  grow  without  the  masking  affects  of  measure¬ 
ments.  This  allows  discrepancies  between  the  real  world  and  the  model  to  become 
more  pronounced  or  visible  in  the  residuals.  The  number  of  update  samples  skipped 
is  determined  empirically  of  course.  The  risk  involved  in  implementing  this  technique 
is  an  increase  in  the  fluctuations  of  the  conditional  probabilities,  which  gives  an  in¬ 
crease  in  false  alarms  [70,  135]... which  can  be  mitigated  by  probability  smoothing. 
By  skipping  measurement  updates,  we  might  also  degrade  the  state  estimates  unless 
we  had  an  artificially  high  sampling  rate  as  far  as  state  estimation  was  concerned. 

2-4-11  Dithering.  Dithering  is  the  purposeful  introduction  of  periodic  (or 
random)  excitation  to  the  system  in  order  to  increase  the  observability  of  actuator 
failures  through  enhanced  persistent  excitation  [145,  146,  58,  57,  147,  187,  71,  186, 
73].  Additionally,  the  sinusoidal  dither  signal  can  be  explicitly  used  to  identify 
mismodeled  filters.  For  the  properly  modeled  filter,  i.e.,  the  one  that  matches  the  real 
world,  the  frequency  content  of  the  power  spectral  density  (PSD)  remains  white39, 
while  for  the  mismodeled  filters,  a  spike  appears  at  the  dither  frequency  [71,  73]. 

Hanlon  [71,  73]  harnessed  the  power  of  the  subliminal  dither  using  a  new  hy¬ 
pothesis  conditional  probability  calculation  to  maximize  the  observability  of  failed 
flight  control  actuators.  Since  the  dither  is  a  highly  time-correlated  (usually  peri- 

39Since  the  dither  effect  in  the  observed  measurements  Z;  is  matched  by  effects  in  the  predicted 
measurement  Hx(t~),  the  dither  is  not  present  in  the  measurement  residuals. 
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odic)  signal,  it  can  be  used  to  mark  the  filters  that  are  based  on  a  poor  model  with 
respect  to  the  real  world  with  a  spike  at  the  known  dither  frequency  in  the  residual’s 
PSD  plot.  The  residual’s  PSD  is  formed  by  Fourier  transforming  the  residual’s  auto¬ 
correlation.  Hence,  the  filter  that  matches  the  real  world  failure  will  have  zero-mean 
white  residuals,  while  the  filters  that  don’t  match  the  real  world  failure  will  pass 
residual  signal  power  at  the  dither  frequency. 

2-4-12  Filter  Pruning.  For  some  applications,  the  parameter  space  is  natu¬ 
rally  discrete,  hence  there  does  exist  an  elemental  filter  that  exactly  matches  the  true 
parameter  state.  Determining  the  carrier  phase  ambiguity  for  a  Global  Positioning 
System  (GPS)  receiver  is  one  such  example  [80,  79].  In  order  to  converge  on  the 
parameter  estimate  quickly,  we  must  somehow  eliminate  or  prune  the  filters  that 
are  “obviously”  incorrect  (according  to  the  filter  measurement  residuals)  by  incor¬ 
porating  an  empirically-based  logic  rule.  Pruning  reduces  the  number  of  elemental 
filters  in  the  fixed-bank  structure.  The  distinctiveness  of  the  parameters  chosen  to 
represent  the  parameter  set  (i.e.,  the  coarseness  of  the  discretization)  highly  influ¬ 
ences  what  percentage  of  the  filters  will  be  pruned.  The  filter  pruning  technique 
employed  by  Henderson  [80,  79]  provides  a  state  estimate  that  is  similar  to  using  the 
MAP  parameter  estimate.  He  cautioned  that  we  must  prune  carefully  so  that  noisy 
measurements  do  not  cause  the  algorithm  to  delete  the  “correct”  filter  mistakenly! 

There  is  a  design  tradeoff  between  implementing  an  exhaustive  bank  of  filters 
that  may  have  the  best  answer  in  it  and  a  small  bank  of  filters  that  is  more  com¬ 
putationally  feasible.  This  idea  was  the  impetus  for  creating  a  “moving-bank”  of 
filters  [130,  81,  136,  132];  the  moving-bank  MMAE  will  be  developed  in  Section  2.6. 
Additionally,  the  structure  of  the  moving-bank  MMAE  has  been  cast  in  terms  of  a 
hierarchical  structure  [59,  188,  203,  138,  139]  discussed  in  Section  2.7.1  and  a  filter 
spawning  structure  [54,  55],  see  Section  2.7.2. 
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For  abrupt  system  mode  (or  parameter)  changes,  the  number  of  required  hy¬ 
potheses  (or,  more  correctly,  decision  tree  branches)  grows.  In  Markov-switching 
parameter  systems,  the  number  of  hypotheses  is  Kl  at  time  Hence,  prudent 
pruning  [4]  and/or  merging  [22,  23]  of  hypotheses  is  essential  [130]. 

2-4-13  Filter  Restart.  When  the  difference  between  the  predicted  measure¬ 
ment  Hfc(ij)  Xfc(t“)  and  the  observed  measurement  z,  increases  over  time,  the  filter 
is  said  to  be  diverging  [130].  Such  divergence  can  occur  in  practice  because,  when 
tuning  each  elemental  filter,  one  must  avoid  adding  too  much  pseudonoise  to  the  fil¬ 
ter  dynamics  model  (i.e.,  increase  Qdfc  too  much),  since  although  such  conservative 
tuning  can  reduce  divergence,  it  can  also  incapacitate  the  adaptation  in  MMAE  al¬ 
gorithms.  In  other  words,  when  the  likelihood  quotient  Lk(ti )  grows  without  bound 
and  surpasses  some  threshold,  the  filter  is  based  on  a  poor  model,  i.e.,  this  condition 
is  indicative  of  a  poor  match  between  the  real  world  and  the  model  [105,  106].  As 
the  filter  diverges,  its  output  is  consequently  of  little  value  at  best  and  misleading 
at  worst.  Hence,  a  divergent  filter  must  be  restarted.  One  popular  method  for  pre¬ 
venting  this  situation  is  to  simply  re-initialize  or  restart  the  divergent  filter  with  the 
current  state  estimate,  i.e.,  set  x/,,(t“)  =  xmmae(O),  where  the  current  state  esti¬ 
mate  is  computed  using  only  the  nondivergent  elemental  filters  [133].  Additionally, 
it  may  prove  useful  to  restart  the  covariance  estimate  P (t~)  as  well. 

2-4-H  Maximum-  Entropy  with  Identity  Covariance.  Ordinarily,  the  resid¬ 
uals  r {t~)  are  weighted  by  the  filter-computed  residual  covariance  A (t~)  when  de¬ 
termining  the  hypothesis  conditional  PDF  /z(tq|a(u),z(u_i)(C;lafc>  Zt_ i);  see  Equations 
(2.44),  (2.45),  and  (2.46).  However,  when  the  filter  computed  covariance,  A (t~), 
is  suspect,  or  varies  a  great  deal  across  the  parameter  set,  A,  then  this  technique 
is  applied  to  ensure  that  the  elemental  filter  with  the  smallest  residual  autocorrela¬ 
tion  is  awarded  the  highest  hypothesis  conditional  probability,  pkiti)  as  computed  in 
Equation  (2.43).  We  suppress  the  relative  weighting  of  the  residuals  by  setting  the 
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covariance  equal  to  an  m  x  m  identity  matrix  [175,  69,  67,  68],  i.e., 


A  (V)  =  I  (2.67) 

and  thus  we  obtain  the  maximum  entropy  with  identity  covariance  hypothesis  con¬ 
ditional  PDF: 


/z(ti)|a(ti),z(ti_1)(zi|afc,  Z  i_i)  =  ,2\m/2  exp  rfc(^)}  (2.68) 

Sheldon  called  this  modification  to  the  algorithm  maximum  entropy  with  identity 
covariance  (ME/I)  since  it  maximizes  the  entropy  of  the  residual  information  [175]. 

2-4-15  Pseudo-Residuals.  A  novel  method  for  detecting  measurement  bias 
jumps  (such  as  GPS  spoofing  [206,  205,  204])  uses  a  pseudo-residual  vice  the  true 
measurement  residual  as  defined  in  Equation  (2.26).  The  pseudo-residuals  are  used 
only  for  inspecting  the  residuals  while  the  true  residuals  are  used  to  update  the 
elemental  filters.  While  true  residual  sequences  (at  steady  state)  are  zero-mean 
white  Gaussian  sequences  with  known  covariances,  the  pseudo-residual  sequences 
have  a  nonzero  mean  equal  to  the  assumed  bias.  Hence  this  formulation  allows  the 
bias  to  be  detected. 

2-4-16  Generalized  Residuals.  For  some  applications,  one  might  conjecture 
that  the  MMAE  may  benefit  from  a  different  form  for  the  measurement  residual.  For 
instance,  when  the  uncertain  parameter  (a)  affects  the  measurement  model’s  struc¬ 
ture  (H)  and/or  statistics  (R)  and  propagation  errors  dominate  the  state  estimate 
[80,  79],  then  perhaps  an  analysis  of  the  “post-fit”  residuals 

r(tf  )~z  i~  H(^)  Htf )  (2-69) 
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with  an  error  covariance  of: 


A(t+)  4  H(fj)  P(t+)  HT(f,)  +  R(U)  (2.70) 

may  be  the  best  method  for  adaptively  estimating  the  parameter  —  compare  to 
the  standard  forms  given  in  Equations  (2.26)  and  (2.23),  respectively.  That  is,  the 
distinguishability  of  the  elemental  filters  is  (assumed  to  be)  more  evident  through 
an  analysis  of  the  post-fit  residuals  than  with  the  standard  set.  In  a  more  detailed 
analysis,  Orrnsby  [156]  showed  that  these  post-fit  residuals  actually  resulted  in  no 
performance  improvement  when  compared  to  an  MMAE  using  traditional  residuals. 
In  showing  this  he  constructed  a  generalized  residual  as  a  weighted  sum  of  the 
traditional  and  post-fit  residuals: 

r*(fi)  =  7r(^r )  +  (1  -  7)r (tf),  (2.71) 

where  the  scalar  7  is  chosen  by  the  designer  to  optimize  the  performance.  When 
7  =  1  we  have  the  traditional  residual  and  when  7  =  0  we  have  the  post-fit  residual. 
There  is  no  theory  regarding  how  to  determine  an  optimal  7,  however,  it  is  suspected 
that  the  optimal  7  is  a  number  between  zero  and  one. 

Orrnsby  [156,  157]  showed  that  previous  researchers  [80,  79]  who  used  the 
post-fit  residuals  would  have  gotten  equivalent  results  using  the  traditional  form  of 
the  residuals.  One  side  effect  of  the  generalized  residual,  for  7  ^  1,  is  the  beta 
dominance  effect  which  was  previously  discussed  in  Section  2.4.4;  hence  researchers 
must  be  careful  when  choosing  the  weighting  factor. 

2.5  A  Virtual  Filter  Bank  Using  Only  a  Single  Filter 

Hanlon  and  Maybeck  [71,  72]  have  proposed  a  computation-saving  virtual  fil¬ 
ter  bank  using  a  single  Kalman  filter  combined  with  a  set  of  linear  transformations, 
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rather  than  a  set  of  K  elemental  hlters40.  The  linear  transformations  capture  the 
differences  between  the  model  used  for  the  single  filter  and  the  models  for  the  vir¬ 
tual  hlters.  The  virtual  filter  bank  is  composed  of  a  bank  of  linear  transformations 
that  compute  equivalent  state  estimates  and  residuals  based  on  a  single  Kalman 
filter  which  produces  the  reference  state  estimate  and  measurement  residual.  They 
have  developed  the  necessary  linear  transforms  to  model  differences  in  the  input  dis¬ 
tributor  matrix  B^41,  output  or  measurement  distributor  matrix  H,  and  the  state 
transition  matrix  3>.  Their  development  is  based  on  the  time  invariant  model  as 
in  Equations  (2.16)  and  (2.17).  Additionally,  they  have  assumed  that  the  Kalman 
filter  models  and  the  truth  model  dynamics  noise  covariance  Qd,  measurement  noise 
covariance  R,  and  noise  distributor  matrices  Gd  are  equivalent.  They  indicate  that 
these  conditions  are  common  in  failure  detection  applications  for  which  the  MMAE 
methodology  has  been  used. 

The  development  begins  by  rewriting  the  measurement  residuals  as  defined  in 
Equation  (2.26)  for  the  jth  filter  with  the  goal  of  eliminating  the  explicit  mention  of 
the  state  estimate  from  the  jth  filter.  After  several  lines  of  algebra,  they  write: 

ri(U)  =  r k(ti)  +  II  ‘h, A <•.,,,(//  ,)  +  [HjA ®kj  +  AHfcj$fc]xfc(ft  i) 

+  [HjABfcj  +  AHfcjBfc]u(£j_i) 

where  the  difference  in  the  state  estimation  errors  for  the  jth  and  A:th  hlters,  €jfc(f?f ), 
is  algebraically  equivalent  to  the  difference  in  the  state  estimates 

jk(tf )  =  ej(tf)  -  ejitj)  =  xfc(t+)  -  xi(t+)  (2.73) 

40While  this  section  does  not  directly  support  the  work  documented  in  this  dissertation,  it  is  an 
example  of  potentially  useful  insights. 

41For  the  purposes  of  this  section,  the  subscript  cl  —  which  is  simply  a  reminder  of  the  discrete 
nature  of  the  quantity  —  will  be  suppressed  in  the  discussion  following  this  introductory  paragraph. 


(2.72) 
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which  can  be  expressed  using  the  following  recursion: 


A  ejk(tt)  =  (I-KjHj)*jAejk(tt1) 

+  [(I  -  KkHk)<f>k  -  (I  -  K y H j ) $ j] xfc i )  (2.74) 

+  [(I  -  K,H,)Bfc  -  (I  -  KjHj)BJ]u(^_1)  +  AKkjz(U) 

and  finally,  four  equivalence  relations 


AB  kj  =  Bfc  —  B  j 
AHfcj  =  H,  H; 
A$kj  = 

AKy  =  K/  -  Kj 


(2.75) 


While  the  linear  transforms  represented  by  Equations  (2.72)  and  (2.74)  appear 
to  be  very  complex,  in  practice,  they  often  simplify  drastically  —  at  least  for  the 
failure  detection  application  that  Hanlon  and  Maybeck  have  addressed. 

2.5.1  An  Example:  Different  Input  Distributor  Matrices.  To  create  an 
actual  filter  to  model  a  single  actuator  failure,  we  would  zero  out  a  column  of  the 
input  distributor  matrix  B.  If  we  assume  that  the  only  difference  between  the 
reference  model  (call  it  the  kth  filter)  and  this  one  used  to  detect  an  actuator  failure 
in  the  jth  filter,  then  of  the  four  increment  matrices  given  in  Equation  (2.75)  only 
ABfcj  is  nonzero,  hence  Equations  (2.74)  and  (2.72)  simplify  to 

A ejk(t+)  =  (I  -  K / H ; )$>jAejk(tf_1)  +  (I  -  K;Hy)  ABfcju(fi_1)  (2.76) 

and 

r j(ti)  =  r  k(U)  +  Hj®jAejk(tf_1)  +  Hj  ABfcju(fj_i)  (2.77) 
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Hanlon  and  Maybeck  [71,  72]  have  developed  similar  relations  for  the  case  of 
a  single  sensor  failure  (different  measurement  distributor  matrices,  i.e.,  AH kj  %  0, 
which  also  means  that  the  Kalman  gains  arc  likely  to  be  different,  hence  AKy  ^  0) 
and  for  different  state  transition  matrices  <1* /  %  (again  AK^  ^  0). 

2.5.2  Equivalent  Residuals.  While  we  won’t  reproduce  their  work  here, 
we  must  note  that  they  spent  considerable  effort  showing  that  the  difference  in 
the  equivalent  residual  produced  by  Equation  (2.72)  is  essentially  identical  to  the 
measurement  residual  from  the  actual  filter  —  nominally  within  the  precision  of 
the  simulation  software.  Thus,  this  technique  is  viable  provided  that  the  extra  work 
necessary  to  set  up  the  new  algorithmic  apparatus  is  more  than  offset  by  the  reduced 
computational  load  provided  by  this  framework. 

2.5.3  Computational  Savings.  While  results  will  vary  depending  on  the 
particular  application,  they  have  estimated  (using  an  operations  counting  technique) 
that  this  virtual  filter  bank  design  can  yield  savings  of  about  30%  for  the  case  of 
different  input  distributor  matrices  as  in  the  example  above.  Similarly,  they  found 
that  the  equivalent  residual  version  of  the  Kalman  filter  bank  reduces  the  required 
operations  count  by  about  15%  compared  to  the  fully  implemented  Kalman  filter 
bank.  It  is  noted  that  there  are  basically  no  savings  if  the  differences  are  confined 
to  the  state  transition  matrices. 

2.5.4  Comment.  While  the  thrust  of  the  current  research  is  to  extend  the 
MMAE,  one  can  not  overemphasize  the  practical  matter  of  reducing  the  computa¬ 
tional  load  whenever  possible.  Aside  from  the  utility  of  this  formulation,  Hanlon 
and  Maybeck  have  noted  the  similarity  of  this  work  to  the  generalized  likelihood 
ratio  test  that  also  uses  the  residuals  from  a  single  Kalman  filter  to  detect  failures 
[211,  71,  72,  198], 
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2.6  Moving  Bank  Multiple  Model  Adaptive  Estimation  Fundamentals 

Until  now,  the  system  mode  (parameter  vector)  was  assumed  to  be  time- 
invariant  for  static  fixed  bank  MMAEs.  If  the  system  mode  is  allowed  to  vary  slowly 
with  time  over  a  large  set  of  admissible  parameter  vectors,  then  a  finely  discretized 
parameter  space  may  give  rise  to  a  prohibitively  large  collection  of  elemental  filters. 
Since  the  number  of  filters  kept  on-line  is  generally  limited,  we  are  motivated  to 
consider  methods  capable  of  adjusting  the  composition  of  the  filter  bank.  Maybeck 
[130]  suggested  an  ad  hoc  approach  to  track  slowly  varying  parameter  vectors  via  a 
“dynamically  redefinable  (or  ‘moving’)  bank  of  filters”  as  opposed  to  a  fixed  “static” 
bank  of  filters.  With  this  technique,  the  bank  of  filters  can  be  both  closely  spaced 
(a  necessary  condition  for  producing  good  state/parameter  estimates)  and  still  cover 
the  entire  range  of  system  modes  while  adhering  to  the  constraint  of  keeping  only  K 
elemental  filters  online  at  any  point  in  time. 

Hentz  and  Maybeck  [81,  136]  completed  the  first  feasibility  study  of  this  “mov¬ 
ing  bank”  MMAE  structure.  The  moving-bank  MMAE  estimates  the  parameter 
vector  using  only  a  small  subset  of  the  entire  hlter  bank.  Subsequent  investigations 
were  undertaken  by  Maybeck,  Gustafson,  Griffin,  Schiller,  Vasquez,  and  Erickson 
[132,  172,  68,  64,  198,  200,  199,  49,  201,  50]  to  reduce  the  number  of  on-line  filters 
required  in  the  MMAE  bank.  Li  and  Bar-Shalom  [120]  have  proposed  a  similar  archi¬ 
tecture  for  the  IMM  called  variable  structure;  this  enhancement  allows  the  model-set 
to  be  dynamically  redeclared  online. 

If  the  parameter  estimate  changes  appreciably,  as  indicated  by  the  hlter  resid¬ 
ual  statistics,  then  the  bank  of  on-line  filters  should  be  adjusted  so  that  the  parameter 
estimate  is  always  surrounded  by  a  “moving”  bank  of  elemental  filters42.  Hence  a 
primary  purpose  for  changing  the  elemental  filters  used  in  the  MMAE  bank  is  to 
track  the  true  parameter  vector  value  using  a  small  number  of  filters  with  fine  dis- 

42The  subset  of  filters  appears  to  move  through  the  parameter  set  as  the  parameter  changes  while 
time  unfolds. 
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cretization  of  the  parameter  space  to  reduce  the  number  of  filters  that  must  be  kept 
on-line.  A  second  reason  for  changing  the  elemental  filters  used  in  the  MMAE  filter 
bank  is  to  acquire  (or  reacquire)  the  true  state  vector  using  a  coarse  discretization 
of  the  entire  parameter  space;  this  may  occur  multiple  times  during  the  course  of  a 
simulation  or  actual  use.  The  composition  of  this  moving  bank  is  governed  by  a  set 
of  logic  rules  that  will  be  discussed  shortly. 

2.6.1  Moving-Bank  versus  Fixed-Bank  Multiple  Model  Adaptive  Estimation. 
The  moving-bank  MMAE  posed  by  Maybeck  and  Hentz  is  the  same  as  the  fil¬ 
ter  bank  estimator  discussed  previously  except  that  K  now  refers  to  a  smaller 
number  of  elemental  filters  in  the  moving  bank  rather  than  the  total  number 
of  possible  elemental  filters  based  on  all  possible  discrete  parameter  vector  val¬ 
ues,  usually  K  «  A"fixed-bank,  where  A"fixed_bank  is  the  total  number  of  filters  in 
the  reservoir  known  as  the  fixed  bank.  Usually  the  bank  of  K  elemental  filters 
“moves”  within  a  previously  constructed  fixed  bank  of  elemental  filters  such  that 
each  afc  e  {ai,  a2, . . . ,  a/Cflxed_bank}  C  A,  for  k  —  1, . . . ,  K,  as  was  originally  imple¬ 
mented  and  investigated  [132,  172,  68,  64];  also,  the  K  elemental  filters  may  be 
created  on-line  and  may  “roam”  throughout  the  entire  parameter  space,  i.e.,  a  &  G  A 
by  discretizing  the  parameter  set  on-line  as  reported  by  Miller,  Vasquez,  and  May- 
beck  [149,  198,  200,  199,  201].  Several  decision  logics  designed  to  reassign  the  K 
“mobile”  filters  from  the  larger  set  of  Afixed-bank  filters  have  been  suggested  and  are 
presented  later  in  this  section.  Several  of  these  rules  are  also  used  by  the  dynamic 
bank  implementation  created  on-line  based  on  a  modified  formulation  of  Sheldon’s 
optimal  parameter  discretization  strategy  [175,  176,  177]. 

In  addition  to  moving  the  filter  bank  while  we  track  the  parameter  changes, 
we  may  also  expand  the  filter  bank  region  of  coverage43  when  it  appears,  i.e.,  when 

43When  the  region  of  coverage  is  expanded,  the  number  of  elemental  filters  in  the  filter  bank  is 
usually  held  constant.  The  IMM  variable  structure  approach  [120]  is  one  method  that  allows  the 
number  of  elemental  filters  to  vary. 
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the  statistics  of  the  filter  residuals  suggest  it,  that  the  true  parameter  value  lies 
outside  the  bounds  of  the  current  bank.  To  expand  the  bank,  we  simply  increase  the 
coarseness  of  the  discretization.  This  will  hopefully  place  the  true  parameter  value 
within  the  confines  of  the  moving  bank  once  again;  i.e.,  we’d  like  to  surround  the 
parameter  we  are  trying  to  estimate.  Note  that  moving  the  bank  and  expanding  the 
bank  are  two  separate  decisions. 

Once  a  bank  has  been  expanded  in  order  to  recapture  the  true  value,  the 
quality  (or  accuracy)  of  the  parameter  estimate  falls  off  (assuming  that  the  number 
of  filters  in  the  bank  is  kept  constant)  until  we  contract  the  bank.  In  other  words, 
it  is  important  that  the  true  parameter  lie  within  the  bank  of  filter’s  coverage  area 
for  adequate  parameter  estimation.  This  will  increase  our  chances  that  one  of  our 
elemental  filters  is  “close”  to  the  true  parameter  value.  The  level  of  discretization 
of  the  continuous  parameter  set  directly  impacts  the  ability  of  the  filter  bank  to 
surround  and  come  as  close  as  possible  to  the  true  parameter  value. 

Many  researchers  have  proposed  novel  ways  of  adjusting  the  bank  of  el¬ 
emental  filters,  e.g.,  the  new  set  of  rules  developed  by  Vasquez  and  Maybeck 
[198,  199,  200,  201].  Additionally,  several  authors  have  proposed  different  “moving- 
bank”  architectures  for  modifying  the  bank  of  filters.  For  example,  the  “filter  spawn¬ 
ing”  architecture  [54,  55]  will  be  discussed  in  Section  2.7.2,  while  the  variable  struc¬ 
ture  approach  designed  by  Li  and  Bar-Shalom  [120,  118]  is  outside  the  scope  of  this 
research. 

2.6.2  A  Short  Glossary  of  Bank  Manipulation  Terms.  Before  proceeding 
with  a  review  of  the  logic  rules  used  to  control  the  composition  of  the  filter  bank, 
we  shall  discuss  a  few  important  terms  regarding  the  movement  of  the  moving-bank 
MMAE  framework: 

Contract  The  filter  bank  contracts  when  the  discretization  level  is  made  finer  (as¬ 
suming  that  the  same  number  of  filters  is  maintained  in  the  bank.) 
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Expand  The  filter  bank  expands  when  the  discretization  level  is  made  coarser  (as¬ 
suming  that  the  same  number  of  filters  is  maintained  in  the  bank.) 

Move  To  move  the  filter  bank  is  to  re-center  the  bank  on  the  latest  parameter 
estimate  while  the  discretization  level  is  (possibly)  held  constant. 

Surround  The  filter  bank  surrounds  the  true  parameter  value  when  the  bank  con¬ 
tains  filters  that  bound  the  parameter  estimate  both  above  and  below  in  all 
“directions”  of  the  parameter  set. 

Track  The  filter  bank  tracks  the  true  parameter  value  by  keeping  the  parameter 
estimate  surrounded  at  all  times;  this  may  require  the  move,  expand,  and 
contract  filter  bank  operations. 

2.6.3  Logic  Rules  for  Moving  the  Bank.  Five  standard  decision  logics  have 
been  suggested  and  investigated  by  Maybeck  [132]  and  others  in  order  to  keep  the 
estimate  of  the  parameter  within  the  bounds  of  the  bank.  A  brief  summary  of 
each  follows.  Additionally,  Vasquez  and  Maybeck  [198,  199,  201]  have  developed  an 
algorithm  that  exploits  the  information  contain  in  a  conditional  PDF  to  move  the 
filter  bank. 


2.6.3. 1  Residual  Monitoring.  The  likelihood  quotient  defined  in 


Equation  (2.46): 


Lkitf)  =  rJ(tj)A  fe  1(ti)rfe(ti), 


(2.46) 


captures  the  useful  information  pertaining  to  the  correctness  of  the  parameter  values. 
For  scalar  measurements,  this  is  simply  the  current  residual  squared,  divided  by  the 
filter  computed  variance  for  the  residual:  Lkitf)  =  r\(ti)/Ak(ti).  When  the  true 
parameter  value  does  not  lie  in  the  current  moving-bank  region,  all  K  likelihood 
quotients  can  be  expected  to  exceed  a  threshold  level  TL,  the  numerical  value  of 
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which  is  set  in  an  ad  hoc  manner  during  performance  evaluations44.  Thus,  a  possible 
detection  logic  would  indicate  that  the  bank  should  be  either  moved  or  expanded  at 
time  t,  if 

min{L1(tj), . . . ,  LK(ti)}  >  TL  (2.78) 

In  other  words,  we  should  expand  the  moving  bank  when  all  of  the  likelihood  quo¬ 
tients  are  too  large.  If  we  apply  the  test  in  Equation  (2.78)  to  only  a  specific  subset  of 
the  current  bank,  we  may  infer  movement  of  the  true  parameter  and  thus  direct  that 
the  bank  be  moved  in  order  to  track  the  true  parameter.  For  example,  if  all  of  the 
likelihood  quotients  for  the  filters  along  the  bank’s  edge  exceed  some  threshold,  then 
the  appropriate  action  would  be  to  move  the  bank  in  the  opposite  direction  where 
the  likelihood  quotients  are  smaller.  This  tool  is  prone  to  false  alarms  because  it  is 
based  on  a  single  residual  at  time  t,t.  We  could  lower  false  alarms  by  averaging  over 
several  time  samples,  but  this  strategy  would  tend  to  decrease  our  responsiveness  to 
real  world  changes  in  the  parameter,  which,  as  you  will  recall,  is  the  hallmark  of  this 
method.  Hence  we  need  a  better  logic  rule. 

2. 6. 3. 2  Parameter  Position  Estimate  Monitoring.  Since  our  intent 
may  well  be  to  track  the  actual  value  of  the  parameter  through  the  parameter  set,  we 
could  explicitly  monitor  the  actual  position  estimate  of  the  parameter  as  a  function 
of  time.  Recall  that  the  Bayesian  MMAE  parameter  estimate  given  by  Equation 
(2.53)  is45 

K 

a (U)  =  E{a{ti)\Z{ti)  =  Zi}  =  ^  AkPk{U)  (2.79) 

k= 1 

If  the  difference  between  the  parameter  estimate  a(t,)  and  the  “center”  of  the  filter 
bank,  acenter,  becomes  too  large,  i.e.,  larger  than  some  chosen  threshold,  then  the 
filter  should  be  moved  in  such  a  manner  as  to  bring  the  center  acenter  and  a  closer 

44See  Hanlon’s  [71]  work  on  Neyman-Pearson  hypothesis  testing  process  as  an  alternative  for  the 
ad  hoc  method  with  which  the  threshold  is  set. 

45We  have  suppressed  the  difference  between  the  propagated  and  update  estimate  here  since  this 
is  not  germane  to  the  discussion. 
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together.  Since  a(U)  depends  on  a  history  of  measurements  rather  than  just  the 
single  current  measurement,  it  is  less  prone  to  the  false  alarms  compared  to  the 
simple  residual  monitoring  method  discussed  above. 


2. 6. 3. 3  Parameter  Position  and  Velocity  Estimate  Monitoring.  An¬ 
other  way  to  incorporate  more  data  into  our  decision  making  logic  is  to  use  the 
history  of  a  (A)  to  generate  a  meaningful  estimate  of  the  parameter  velocity46  if  the 
true  parameters  vary  slowly  with  time  —  otherwise  we  will  get  what  may  appear  to 
be  as  random  motion: 


Hu)  = 


a(ti)  -  a(ti_i) 


i—  1 


(2.80) 


The  parameter  estimate  velocity  sl(U)  and  current  position  estimate  a(ij)  can  be  used 
to  predict  the  parameter  position  one  sample  period  into  the  future  by  rearranging 
Equation  (2.80): 

a(ti+i)  =  a (U)  +  a(ti)[ti+i  -  U\  (2.81) 


If  the  distance  between  the  bank  center  and  that  prediction  a(tj+i)  exceeds  some 
selected  threshold,  then  the  bank  can  be  moved  in  anticipation  of  the  true  parameter 
movement.  This  approach  introduces  lead  into  the  moving-bank  logic,  but  also  a 
higher  level  of  uncertainty  and  possibly  erratic  bank  movement  if  the  true  value  of 
the  parameter  changes  too  rapidly. 


2. 6. 3. 4  Probability  Monitoring.  The  conditional  hypothesis  probabil¬ 
ities,  Pk(U ),  computed  using  Equation  (2.43),  are  another  indication  of  the  correct¬ 
ness  of  the  parameter  values  a*,  assumed  by  the  elemental  filters  of  the  current  bank. 
If  any  of  these  probabilities  rise  above  a  chosen  threshold  level,  the  bank  can  be 
moved  in  the  direction  of  the  a*,  associated  with  the  highest  Pk(ti).  In  this  scheme, 
the  bank  seeks  to  center  itself  on  the  elemental  filter  with  the  highest  conditional 
probability  weighting.  Again,  since  Pk{ti)  depends  on  a  history  of  measurements, 

46Velocity  is  simply  the  time  rate  of  change  of  position. 


2-54 


this  method  should  somewhat  insensitive  to  singular  instances  of  measurement  cor¬ 
ruption  as  is  the  case  under  residual  monitoring. 

2. 6. 3. 5  Parameter  Estimation  Error  Covariance  Monitoring. 
Whereas  residual  monitoring  may  be  used  to  increase  the  spacing  of  the  filters  in  the 
filter  bank,  i.e.,  expand  the  bank,  this  technique  allows  us  to  contract  the  bank  by 
decreasing  the  discretization  level  of  the  parameter  set.  By  starting  with  a  coarsely 
discretized  parameter  set,  alr. .  . , a^-,  we  increase  our  chances  of  surrounding  the 
true  parameter  value.  Then  with  proper  testing  we  may  contract  the  bank  centered 
on  that  parameter  estimate.  A  good  way  to  help  make  such  a  contraction  decision 
is  to  monitor  the  parameter  estimation  error  conditional  covariance  [130]  given  in 
Equations  (2.54)  and  (2.55)  and  repeated  here  (minus  the  “MMAE”  subscripts) 

P.(tf)  =  S{[a(«i)-a(y)][a(«i)'-~;a(y)]T|Z(ii)  =  Zi}  (2.82) 

I< 

=  —  a(^)][afc  —  a(tA)]TPfc(^)  (2.83) 

k=  1 

When  an  appropriately  chosen  norm  (a  scalar  function  indicating  size  or  distance) 
of  this  matrix  falls  below  a  selected  threshold,  the  bank  can  be  contracted  about  the 
parameter  estimate.  One  such  norm  is  the  weighted  sum  of  the  diagonal  terms  of 
the  covariance  matrix  Pa(ij)  for  a  moving  bank  constrained  to  be  a  square  region  in 
a  two-dimensional  parameter  set  [132]: 

I< 

||Pa(*i)||  =  J][Pak(^)  (2.84) 

k= 1 

In  general,  one  could  use  different  discretization  coarseness  decisions  in  individual 
directions  of  the  parameter  set,  allowing  rectangular  banks  as  well  as  squares. 
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Note  that  we  can  rewrite  the  state  equations  for  a  deterministic  time-invariant 
system4'  into  diagonal  canonical  form  if  the  eigenvalues  of  the  system  transfer  func¬ 
tion  are  unique  (non-repeated)  or  into  a  modified  canonical  form  if  there  are  repeated 
roots.  Similarly,  we  can  use  the  eigenvalues  and  eigenvectors  of  the  covariance  ma¬ 
trix  P a(ti)  at  time  U  to  find  the  principal  axes.  With  these  principal  axes,  we  could 
describe  elliptical  shaped  banks  in  the  parameter  set  [129]. 

An  indication  of  the  need  to  expand  the  size  of  the  bank  can  be  obtained  from 
residual  monitoring  as  before.  When  all  of  the  likelihood  quotients  from  Equation 
(2.46)  are  large  in  magnitude  (indicating  that  none  of  the  current  elemental  filters 
appear  to  have  a  good  model  or  hypothesized  parameter  value),  then  it  is  more 
appropriate  to  expand  the  bank  than  to  attempt  to  move  it  because  no  clear  indi¬ 
cation  of  the  true  parameter’s  value  is  provided  with  this  particular  bank  of  filters. 
The  error  covariance  could  then  be  monitored  for  making  the  decision  to  return  the 
bank  to  a  smaller  size.  Since  Equation  (2.83)  depends  on  the  current  choice  of  a*, 
values,  this  error  covariance  is  not  a  reliable  indicator  for  the  decision  to  expand 
because  the  computed  P a(£j)  is  artificially  bounded  above  by  the  current  size  of  the 
bank.  Regardless  of  which  technique  is  used  to  move,  contract,  or  expand  the  hlter 
bank,  the  newly  declared  models  must  be  initialized  with  values  for  Xfc(tj),  P k(ti), 
and  Pk(ti).  A  common  and  reasonable  choice  for  Xfc(tj)  is  the  current  moving-bank 
blended  estimate  x(t4').  For  the  new  Pk(ti),  we  equally  divide  up  the  probability 
weight  of  the  discontinued  filters,  i.e. ,  if  filters  one  through  three  had  a  total  prob¬ 
ability  of  just  one  tenth,  then  each  of  the  new  filters  will  have  probability  of  one 

47The  deterministic  time-invariant  system  is 

x(f)  =  Fx(f)  +  Bu(f) 
z  (U)  =  Hx(t) 

where  F,  B,  and  H  are  as  described  in  Section  2.3.1.  The  system  transfer  function  matrix  for  this 
system  is 

G(s)  =Ht[sI-F]-1B  (2.85) 

where  s  is  the  Laplace  transform  variable  [129]. 
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third  of  one  tenth.  Another  method  apportions  the  probability  based  on  the  relative 
correctness  of  the  new  filters  being  added  [81,  136].  This  correctness  is  based  on  an 
evaluation  of  the  likelihood  quotients  determined  from  Equation  (2.46)  for  each  new 
filter;  it  is  used  to  divide  the  probability  proportionately.  This  method  equates  the 
smallest  likelihood  quotient  with  the  most  correct  filter  and  thus  the  most  correct 
filter  shall  receive  the  greatest  probability  allocated  to  the  new  filters.  However, 
this  apportionment  technique  doesn’t  usually  perform  better  in  practice  relative  to 
simpler  equal-distribution  methods. 

2. 6. 3. 6  Density  Algorithm:  Logic  Rules  for  Moving  the  Bank.  As 
opposed  to  the  simple  rules  just  discussed  in  the  preceding  subsections,  the  “density 
algorithm”  developed  by  Vasquez  and  Maybeck  [198,  199,  201]  provides  intelligent 
decision  making  for  movement,  contraction,  and  expansion  of  the  adaptive  MMAE 
filter  bank.  The  density  algorithm  gets  its  name  by  exploiting  information  provided 
by  the  hypothesis  conditional  probability  density  function  /z(tq|a(n),z(ti_i)(C;lafc>  Zj_i) 
defined  in  Equation  (2.44).  Unfortunately,  this  algorithm  relies  heavily  on  uniform 
spacing  of  the  parameter  values  of  the  online  filters.  Note  that  uniform  parameter 
value  spacing  is  not  a  usual  feature  of  the  bank  composition.  This  prompted  Vasquez 
[198]  to  combine  the  basic  density  algorithm  with  a  new  online  discretization  tech¬ 
nique  that  does  not  rely  on  simple  uniform  spacing  of  the  parameter  values. 

2.6.4  Hypothesis  Testing.  Multiple  model  estimation  employs  hypothesis 
testing  in  a  variety  of  ways,  using  the  filter  measurement  residuals  to  help  make 
decisions.  Hypothesis  testing  of  the  residuals  (or  more  commonly,  a  function  of  the 
residuals)  is  used  to  determine  when  the  composition  of  the  bank  should  be  changed 
and  how  it  should  be  modified.  The  testing  of  the  hypotheses  is  how  changes  in 
the  system  are  detected;  this  is  known  as  detection  theory,  see,  e.g.,  [170,  101].  The 
multiple  model  estimation  schemes  discussed  in  this  document  would  not  be  possible 
without  hypothesis  testing. 
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The  proximity  of  the  parameter  estimate  to  the  true  value  is  dependent  on  the 
coarseness  of  the  discretization.  Larger  than  expected  residual  magnitudes  indicates 
a  mismatch  between  the  filter  model  and  the  “truth”  model.  The  truth  model  is  the 
best  model  that  we  can  build  irrespective  of  its  feasibility  of  employment  —  we  desire 
the  closest  possible  representation  of  the  real  world  system.  A  parameter  change  in 
the  true  system  would  be  reflected  as  a  change  in  magnitude  of  the  residuals  of  Liters 
based  on  different  hypotheses  when  the  change  occurred.  The  change  in  the  residuals 
can  appear  as  a  nonzero  mean  or  a  change  of  covariance. 

In  addition  to  the  hypothesis  conditional  probability  computed  for  each  model, 
see  Equation  (2.43),  we  can  also  gather  additional  information  about  the  models 
through  decision  theory  via  hypothesis  testing.  Hypothesis  testing  is  a  process  of 
establishing  the  validity  of  a  hypothesis,  where  the  statistical  hypothesis  is  simply  an 
“assumption  about  the  value  of  one  or  more  parameters  of  a  statistical  model”  [159] 
or  more  precisely,  an  “assertion  about  the  [probability]  distribution  of  one  or  more 
random  variables”  [86].  The  null  hypothesis  is  usually  the  condition  of  no  change 
while  the  alternative  hypothesis  encapsulates  the  “change.”  The  hypothesis  test  is  a 
rule  which  can  be  used  to  determine  whether  or  not  to  reject  the  null  hypothesis48. 

The  two  common  classes  of  errors  are  the  type  I  and  type  II  errors.  Type  I 
errors  occur  when  we  mistakenly  reject  a  null  hypothesis  that  is  actually  in  force 
(true);  a  type  II  error  happens  when  we  fail  to  reject  (or  accept)  a  null  hypothesis 
that  is  not  in  force,  i.e. ,  false  [168].  For  our  work,  the  null  hypothesis  occurs  when 
the  system  is  functioning  as  intended,  while  a  malfunctioning  system  would  be  the 
alternate  hypothesis;  as  the  name  suggests,  there  may  be  more  than  one  alternate 
hypothesis  in  a  multiple  hypothesis  testing  scheme.  When  a  hypothesis  test  falsely 
indicates  that  the  system  is  malfunctioning,  a  type  I  error  has  occurred  —  this  error 
is  commonly  known  as  a  false  alarm.  Similarly,  a  type  II  error  is  committed  whenever 

48Note  that  if  a  hypothesis  is  not  rejected  that  it  is  not  necessarily  true,  this  acceptance  merely 
indicates  that  the  data  considered  supports  the  conjecture;  likewise,  a  rejected  hypothesis  is  not 
necessarily  false  [168]. 
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the  test  indicates  that  the  system  is  functioning  properly,  when  in  fact  it  is  operating 
“poorly”  in  some  respect;  this  error  is  often  called  a  missed  detection  in  that  the 
test  missed  detecting  the  problem.  When  applied  to  the  target  detection  problem, 
a  false  alarm  simply  says  that  there  was  a  target  present  when  none  truly  was,  and 
a  missed  detection  is  simply  that  the  target  was  present,  but  that  the  test  indicated 
a  target  free  area. 

2.6.4- 1  Chi-Squared  Test.  The  chi-squared  test  is  one  of  the  earliest 
methods  used  for  statistical  inference  [86];  in  failure  detection,  it  uses  the  measure¬ 
ment  residuals  from  a  (Kalman)  filter  to  determine  whether  a  failure  has  occurred. 
Thus,  this  test  gathers  knowledge  of  the  system  dynamics  by  interpreting  the  filter 
measurement  residuals.  The  chi-squared  random  variable,  x2{ti)i  provides  a  test 
statistic  that  places  a  quadratic  penalty  on  residual  variance  for  the  kth  Kalman 
filter  model  [198]: 

i 

X2(u)  =  (2-86) 

l  =  i-N+ 1 

where  N  is  the  size  of  the  sliding  window  used  to  make  the  decision.  A  detection 
rule  with  an  empirically  determined  threshold  Tx 2  is: 


X2(ti)  >  TX2  — >  Parameter  Change 
X2(ti)  A  Tx 2  — >  No  Parameter  Change 


(2.87) 


The  threshold  will  be  chosen  to  meet  the  performance  specifications  and  to  minimize 
false  alarms  (type  I  errors)  and  missed  detections  (type  II  errors)  [170].  The  chi- 
squared  test  has  been  a  highly  effective  and  consistent  failure  detector  [152];  however, 
this  test  is  basically  an  alarm  method  that  isn’t  of  much  use  for  isolating  failures  if 
not  used  in  a  multiple  model  structure  [211], 


2. 6. 4-2  Neyman- Pears  on  Test.  The  most  powerful  test  is  defined 
as  the  hypothesis  test  which  yields  the  greatest  probability  of  detection  for  a  given 


2-59 


level  of  false  alarms.  The  Neyman-Pearson  (hypothesis)  test,  based  on  the  Neyman- 
Pearson  lemma,  yields  the  most  powerful  test  [170].  Hanlon  [71]  replaced  the  stan¬ 
dard  Wary  hypothesis  test  with  a  Neyman-Pearson  based  hypothesis  test  extended 
especially  for  the  MMAE  structure.  In  an  N  =  2  binary  hypothesis  test,  the  null 
hypothesis  is  either  rejected  or  accepted,  whereas  in  a  Neyman-Pearson  test,  a  third 
option  located  “between”  acceptance  and  rejection  of  the  null  hypothesis  is  avail¬ 
able.  The  somewhat  noncommittal  third  response  is  to  reject  the  null  hypothesis 
with  a  certain  probability  as  determined  by  the  test  function;  thus  acceptance  of 
the  null  hypothesis  is  the  same  as  rejection  with  zero  probability.  With  Hanlon’s 
Neyman-Pearson  hypothesis  testing  algorithm  [71],  the  residual  sequence  from  a  sin¬ 
gle  Kalman  filter  could  be  used  to  perform  multiple  hypothesis  tests,  as  opposed  to 
having  an  entire  bank  of  filters  feeding  an  N-ary  hypothesis  testing  algorithm  used 
by  the  standard  MMAE. 

2. 6. 4- 3  Sequential  Probability  Ratio  Test.  For  “short”  fixed-length 
data  sets,  there  may  not  be  enough  information  to  distinguish  between  the  various 
hypothesis.  In  this  case,  it  is  desirable  to  have  a  test  that  continues  to  collect 
data  until  there  is  enough  information  to  make  a  decision.  When  this  occurs,  the 
sequential  probability  ratio  test  (SPRT)  developed  by  Wald  [202]  is  superior  to  the 
Neyman-Pearson  test  [120]. 

2. 6. 4- 4  Generalized  Likelihood  Ratio  Test.  The  generalized  likelihood 
ratio  (GLR)  test  is  similar  to  the  chi-squared  test,  but  it  has  the  capability  to  detect 
abrupt  changes  in  dynamic  systems  and  isolate  failures  [214,  211,  213,  212,  152, 
206,  198,  101].  A  hypothesis  for  each  type  of  failure  that  can  affect  the  system 
is  constructed.  The  GLR  test  processes  the  measurement  residuals  from  a  single 
Kalman  filter  in  parallel  in  order  to  detect  changes  (failures)  in  the  system,  whereas 
in  MMAE,  we  have  a  bank  of  Kalman  filters.  One  key  benefit  of  the  GLR  is  that  it 
needs  only  one  estimator  for  each  failure  type  since  it  produces  its  own  estimate  of  the 
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failure  magnitude.  This  magnitude  may  be  used  in  a  feedback  loop  to  aid  the  system 
in  terms  of  recovery  or  reconfiguration.  The  MMAE  has  generally  outperformed  the 
GLR  test. 


2.1  Advanced  Moving-Bank  Multiple  Model  Adaptive  Estimation  Structures  and 

Techniques 

In  this  section,  we  shall  briefly  discuss  several  advanced/specialized  techniques 
beginning  with  two  moving-bank-like  structures  that  have  been  applied  to  the  failure 
detection  problem.  Next  we’ll  cover  two  algorithms  for  discretizing  the  parameter 
space.  Then  we’ll  investigate  a  modification  to  the  hypothesis  conditional  PDF  which 
results  in  a  hypothesis  testing  algorithm  designed  not  only  to  detect  failures,  but  to 
isolate  them.  Finally,  we  briefly  introduce  a  method  for  improving  state  estimation 
by  using  an  MMAE  to  identify  the  system  mode,  i.e.,  the  unknown  system  parameter, 
and  then  using  a  separate  Kalman  filter  for  state  estimation. 

2. 1. 1  Hierarchical  Structured  Filter  Bank.  Hierarchical  estimation  theory 
was  studied  by  Smith  and  Sage  [179],  blended  with  Magill’s  multiple  model  method 
[125]  by  Fry  and  Sage  [59],  and  later  implemented  by  Stevens,  Maybeck,  and  others 
[188,  138,  139,  145,  147,  47,  46,  35]  as  a  way  to  reduce  the  required  number  of 
elemental  filters  to  detect  multiple  actuator/sensor  failures  for  reconfigurable  flight 
control  via  MMAE  methods.  If  a  multiple  model  algorithm  were  based  on  all  possible 
single  and  double  failures  of  N  sensors  and  actuators,  then  we  would  need  filters  to 
correspond  to  the  cases  of:  Fully  functional  (1  filter),  single  failure  (N  filters),  and 
double  failures  ((JV^!9!  filters49)  for  a  total  of  1  +  IV  +  elemental  filters. 

To  avoid  this  computational  burden,  we  could  cast  the  problem  into  a  hierarchical 
structure  where  we  only  have  N  + 1  on-line  filters  at  any  given  time.  The  initial  bank 
of  filters  are  denoted  by  the  title  “Level  Zero”  which  corresponds  to  zero  failures 

49Recall  that  for  positive  integer  N,  the  factorial  is  the  product:  N\  =  N  ■  ( N  —  1)  •  •  •  1. 
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detected,  i.e.,  the  models  are  based  on  the  condition  that  no  failures  have  been 
detected.  We  also  have  have  N  banks  of  N  +  1  filters  on  standby;  the  “Level  One” 
banks.  Upon  confirmation  of  an  initial  failure,  we  bring  on-line  the  appropriate 
bank  designed  with  the  assumption  that  a  failure  has  occurred  in  the  nth  sensor 
or  actuator  surface.  In  this  “Level  One”  bank,  we  also  include  the  “no  failure” 
Liter  to  handle  the  possibility  that  the  sensor  (or  actuator)  was  mistakenly  declared 
inoperative  or  allow  for  the  possibility  that  it  can  get  better;  hence  N  + 1  Liters.  So, 
at  most,  we  will  have  11  Liters  online  in  the  case  of  IV  =  10  sensors,  versus  56  - 
quite  a  savings! 

2.7.2  Filter  Spawning.  Filter  spawning  is  a  type  of  moving-bank  Liter 
structure  which  focuses  on  improved  parameter  identiLcation  with  computational 
savings.  This  architecture  was  implemented  to  help  determine  the  amount  of  degra¬ 
dation  suLered  by  a  failing  or  failed  actuator /sensor.  This  work  originated  with 
Fisher  [54,  55];  it  features  a  permanent  collection  of  Liters  that  are  always  on-line 
and  an  additional  set  of  Liters  that  may  be  called  upon  to  augment  the  standard 
set  as  necessary.  These  augmenting  sets  of  Liters  are  “spawned”  only  after  a  speciLc 
actuator  surface  is  declared  to  have  failed  to  some  degree,  i.e.,  a  partial  failure  is 
detected.  These  spawned  Liters  assist  in  determining  the  level  of  failure. 

2.7.3  Optimal  Parameter  Discretization.  Several  ad  hoc  techniques  de¬ 
signed  to  discretize  the  parameter  set  were  discussed  in  Section  2. 3. 3. 2.  Sheldon 
[175,  176,  177]  cast  the  previously  ad  hoc  process  of  continuous  variable  parameter 
discretization  as  an  optimization  problem  designed  to  improve  either  the  state  or 
parameter  estimation  or  state  control/regulation.  His  research  delivered  a  static  op¬ 
timal  parameter  discretization  since  the  parameter  set  was  discretized  oL-linc  prior 
to  running  the  algorithm;  hence  the  bank  of  (steady-state)  Liters  is  static  50 

50Miller  [149]  and  Vasquez  [198]  jointly  extended  the  static  optimal  parameter  discretization  to 
enable  a  bank  of  (time- varying)  filters  to  be  redeclarecl  while  on-line,  i.e.,  the  parameter  set  may 
be  rediscretized  on-line;  this  is  a  dynamic  discretization  method. 


2-62 


The  basic  question  being  addressed  by  this  algorithm  is:  “If  we  are  allowed  to 
choose  the  K  points  in  the  parameter  set,  where  should  they  be  placed  in  order  to 
yield  ‘optimal’  MMAE  performance?”  The  optimal  MMAE  performance  was  defined 
as  that  which  minimized  the  average  mean-squared  estimation  error  of  states  or 
parameters51. 

The  key  ingredient  of  this  approach  is  the  creation  of  the  cost  functional  used 
to  optimize  the  parameter  discretization  for  state  estimation: 


fA  E{[x  -  x]TWx[x  -  x]}da 

/Ada 


(2.88) 


where  A  denotes  the  admissible  parameter  set  discussed  in  Equations  (2.36)  and 
(2.37),  Wx  is  the  user-specihed  weighting  matrix  used  to  emphasize  or  deemphasize 
various  states,  and  the  denominator,  fA  da  =  fA  ■■■  fA,,  fAl  daA  da-2 ■  ■  ■  daj,  normalizes 
the  contribution  of  the  parameter  set.  Similarly,  the  cost  functional  minimized  to 
optimize  the  parameter  estimate  is: 


/A£{[a  -  a]TWa  [a  -  a]}da 

/Ada 


(2.89) 


where  the  matrix  Wa  is  a  user-specihed  weighting  scheme. 

Sheldon  made  several  assumptions  in  order  to  keep  the  mathematics  tractable: 
the  structure  of  the  system  model  is  known  except  for  a  J-vector  of  parameters, 
a,  from  the  infinite  set  A,  the  bank  is  composed  of  K  constant-gain,  steady-state 
filters,  and  the  MMAE  converges  to  the  “best”  model  in  the  “Baram”  sense  [16,  15] 
with  probability  one.  Given  that  there  are  sufficient  conditions  for  the  MMAE  to 
converge  in  the  Baram  sense,  Sheldon  found  that  the  MMAE  will  converge  to  the 

51Sheldon  [175,  177]  also  applied  his  methodology  to  optimize  the  average  mean-squared  regula¬ 
tion  error  performance  of  the  controller  as  criterion  for  the  optimization  of  the  MMAC. 
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7 1 h  filter  governed  by  [175,  177] 


Ijiti)  =  min  lk(ti)  (2.90) 

J  k=l,...,K 

where  4(4)  is  defined  as  the  proximity  of  the  7th  filter  generated  as 

lk{ti)  =  ln|Afe(4)|  +  tr  (A^1^)  [flk(U)  +  Rt(4)]}  (2.91) 

where  A k{tj)  is  the  filter-computed  residual  covariance  and  f 4(4)  is  a  quadratically- 
scaled  steady-state  state-estimation  error  autocorrelation,  and  Rt(4)  is  the  truth 
measurement  covariance  —  see  [175]  for  the  details. 

Finally,  Sheldon  devised  a  five-step  (off-line)  algorithm  to  approximate  nu¬ 
merically  and  minimize  the  cost  functional.  This  algorithm  required  (among  other 
things)  a  truth  model  of  the  system,  the  number  of  filters,  K,  to  be  implemented  in 
the  estimator,  and  an  initial  sample  set,  {ai,  a2, ...,  aA-}  to  begin  the  minimization 
[175].  Additionally,  Sheldon  extended  his  work  to  account  for  the  design  practice  of 
placing  lower  bounds  on  the  filter  probabilities,  pk,  to  avoid  the  estimator  lock-up 
problem  previously  discussed  in  Section  2.4.6. 

2.7.4  Inter- Residual  Distance  Feedback.  The  success  of  the  MMAE  de¬ 
pends  on  the  “distinguishability”  of  the  models  used  in  the  bank  of  Kalman  filters. 
To  determine  which  parameter  value  to  use,  there  must  be  appreciable  differences 
between  the  characteristics  of  the  residuals  for  the  “correct”  model  versus  the  other, 
mismatched,  filters.  In  the  limit  as  the  residuals  become  indistinguishable,  the  adap¬ 
tation  process  is  totally  incapacitated.  For  fast  and  reliable  parameter  identification, 
assuming  one  of  the  hypothesized  filters  models  is  based  on  the  true  parameter  value, 
the  residuals  should  be  as  distinct  as  possible  [124,  130]. 

Inter-residual  distance  feedback  (IRDF)  [123,  124]  provides  for  the  on-line  mod¬ 
ification  of  the  Kalman  filters  in  the  MMAE  bank  for  the  purpose  of  maintaining 
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the  distinguishability  of  the  elemental  filters.  The  discrimination  property  of  the 
MMAE  is  preserved  by  continually  adjusting  the  Kalman  filters  to  keep  the  pre¬ 
dicted  measurements  from  becoming  too  close  with  respect  to  some  performance 
metric. 

Modification  of  the  elemental  filters  is  achieved  by  de-tuning  the  filters  through 
the  modulation  of  either  the  dynamic  driving  noise  covariance,  Qdfc,  or  the  new 
information  s*,  =  K/,.r/,:  directly,  where  K/^  is  the  kth  Kalman  filter  gain  matrix  and 
rfc  is  the  residual  of  the  kth  Kalman  filter.  The  modulation  process  is  governed  by  a 
scalar  quantity  computed  from  a  distance  measure  between  the  residuals.  Lund 
stressed  the  trade-off  between  discrimination  and  state  tracking;  tracking  is  the 
ability  of  the  Liter  to  predict  the  state  x(L)  and  output  zftf)  given  the  measurement 
history  Z^.  The  trade-off  is  between  the  desire  for  good  tracking  capabilities  when 
one  model  matches  the  true  system  versus  the  desire  for  the  residuals  of  the  various 
Liters  to  be  distinct  from  each  other,  enabling  fast  and  reliable  model  discrimination. 
If  small  Kalman  Liter  gains  are  used  to  de-emphasize  the  impact  of  the  measurement 
information,  then  the  residuals  of  the  various  Liters  will  tend  to  be  more  distant  from 
each  other.  Thus,  the  elemental  Liters  should  not  be  tuned  totally  independently  of 
the  other  Liters  [130]  and  the  distance  between  the  residuals  should  be  large  enough 
for  inter-residual  distinguishability  [124]. 

With  that  in  mind,  Lund  dehned  the  inter-residual  difference  measure 

Jjk(ti )  =  rJk(ti)Tjkrjk(ti),  Vj  7^  k  (2.92) 

where  the  inter-residual  difference  is  dehned  as 

rjk(U)  =  Tj(ti)  -  r k(ti)  (2.93) 


2-65 


with  the  7 1 h  Kalman  filter  residual  r j(tj)  and  is  a  positive  definite  scaling  matrix 
that  is  often  diagonal.  To  maintain  distinguishable  residuals,  we  seek  to  keep  Jjk(U) 
above  some  threshold  we’ll  call  J®k  by  adjusting  filter  gains. 

A  general  approach52  used  to  keep  the  inter-residual  distance  measure,  Jjk(ti), 
above  some  specified  limit  Jjk  is  to  vary  the  dynamics  noise  covariance,  Qdfc,  which 
by  inspection  of  Equations  (2.22)  through  (2.24)  in  Section  2.3.2  directly  modifies 
the  Kalman  filter  gains.  We  adjust  Qdfc  with  a  modulating  parameter  rj  by 

Q'd k(ti)  =  v(ti) Qdfc(ti),  V/c  G  {1,  2, ... ,  K}  (2.94) 

where  r)(ti)  G  [r/mjn,  1].  In  order  to  maintain  system  stability,  the  lower  bound  must 
be  nonnegative,  i.e.,  r]min  >  0;  Lund  [124]  and  Miller  [149]  both  discuss  how  one  might 
smartly  choose  r)(ti).  Simulations  are  often  used  to  help  determine  good  values. 

If  the  system  models  are  linear,  as  they  are  in  this  research,  then  another  way 
to  impart  change  in  the  system  is  by  modulating  the  new  information53  s k(U)  = 
K k(ti)rk(ti)  as  we  did  the  dynamics  noise 

s 'fc(ti)  =  v(U)s fc(ti),  VfcG{  1,2,...,  K}  (2.95) 

The  filter  gains  are  now  pre-computable  and  only  the  modulation  parameter  r)(ti)  is 
computed  on-line.  Furthermore,  modulating  s k(ti)  versus  Qd k(U)  leads  to  a  faster 
adaptation  because  we  don’t  have  to  wait  for  the  filter  error  covariance  transients  to 
die  before  the  corresponding  filter  gains  are  changed. 

52While  Lund’s  [123,  124]  work  was  with  continuous-time  systems  with  sampled  measurements, 
we  are  assuming  that  we  have  a  discrete-time  system  (or  at  least  we  are  using  an  equivalent  discrete¬ 
time  model  for  a  continuous  system)  and  thus  the  rest  of  the  development  will  follow  Miller’s  work 
[149]. 

53For  a  discussion  on  the  statistics  of  the  new  information,  see  for  example  Section  5.4  of  Maybeck 

[129]. 
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2.7.5  Probability-Based,  Discretization  Method.  The  IRDF,  discussed  in 
the  previous  section,  worked  to  ensure  stable  probability  flow  in  the  filter  bank  by 
maintaining  adequate  distinguishability  among  the  elemental  filters  in  a  relatively 
fixed  bank.  In  this  section,  we  introduce  a  technique  aimed  at  enhanced  “motion” 
and  “sizing”  of  the  filter  bank  for  a  moving-bank  MMAE.  Recall  that  the  primary 
advantage  of  the  moving  bank  is  that  potentially  fewer  filters  are  required  to  identify 
the  parameter  or  system  mode.  The  central  concept  employed  by  the  probability- 
based  discretization  method  (PBDM)  [198,  200,  201]  is  to  determine  the  parameter 
values  for  the  elemental  (Kalman)  filters  based  on  the  calculation  of  the  probability 
p r(xl  —  TX2),  where  the  chi  squared  random  variable,  is  defined  as 

xl  =  tfA-hk  (2.96) 

which  is  the  same  as  the  familiar  likelihood  quotient,  Lkj  previously  defined  in  Equa¬ 
tion  (2.46),  for  threshold  Tx 2.  In  most  cases,  xl->  is  a  generalized  chi  squared  random 
variable  since  the  filter-computed  residual  covariance  is  only  equal  to  the  true  resid¬ 
ual  covariance  when  the  model  matches  the  real  world  (truth)  condition,  i.e.,  only 
when  ak  =  at  will  we  have  a  chi  squared  random  variable.  When  xl  —  m->  where  m 
is  the  length  of  the  residual  vector,  then  we  see  that  filter  model  is  a  good  match 
to  the  true  condition,  while  xl  »  rn  is  an  indication  of  a  poor  match  with  the 
real  world  condition.  Thus,  rather  than  using  the  ad  hoc  movement  and  resizing 
rules  discussed  earlier,  this  algorithm  attempts  to  rediscretize  the  parameter  space 
dynamically  to  achieve  dynamic  movement  and  sizing  of  the  filter  bank  using  the 
information  already  available  to  the  algorithm. 

2.7.6  Residual  Correlation  Kalman  Filter  Bank.  Hanlon  [71,  73]  harnessed 
the  power  of  the  subliminal  dither,  previously  discussed  in  Section  2.4.11,  using  a 
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new  hypothesis  conditional  probability  calculation54  to  maximize  the  observability 
of  failed  flight  control  actuators.  Since  the  dither  is  a  highly  time-correlated  (usually 
periodic)  signal,  it  can  be  used  to  mark  the  filters  that  are  based  on  a  poor  model  with 
respect  to  the  real  world  with  a  spike  at  the  known  dither  frequency  in  the  residual 
sequence  PSD  plot.  While  the  standard  MMAE  does  not  exploit  the  whiteness  of  the 
residual  sequence  for  a  properly  matched  system  model,  Hanlon’s  algorithm  directly 
harnesses  this  knowledge.  The  purposeful  dither  appearing  in  the  measurement  is 
canceled  by  a  dither  in  the  predicted  measurement  which  matches  the  real  world 
condition,  while  an  incorrectly  predicted  measurement  will  not  cancel  the  dither 
present  in  the  actual  sensor  output.  Thus,  the  end  result  is  that  the  elemental  filter 
which  matches  the  real  world  failure  will  have  zero-mean  white  residuals,  while  the 
filters  that  don’t  match  the  real  world  failure  will  pass  the  dither  at  the  chosen 
frequency. 

An  algorithm  which  utilizes  the  purposeful  dither  is  the  residual  correlation 
Kalman  filter  bank  (RCKFB)  technique;  this  method  is  based  on  a  time-invariant 
system  model  for  the  elemental  filters  as  shown  in  Equations  (2.40)  and  (2.41). 
The  residual  sequence  is  assumed  to  be  ergodic55  so  that  the  periodogram  can  be 
used  to  estimate  the  PSD  [99].  The  hypothesis  conditional  PDF  employed  by  the 
RCKFB  hypothesis  testing  algorithm  is  formed  using  different  components  for  the 
residual  vector  (r)  and  covariance  matrix  (A).  The  measurement  is  replaced  by  the 
estimated  PSD  of  the  measurement  residual  (T),  while  the  analog  of  the  predicted 
measurement  is  the  conditional  mean  of  the  PSD  measurement  residual  ( T ) .  The 
residual  is  given  by  the  difference  of  the  estimated  PSD  and  the  conditional  mean 
of  the  PSD  evaluated  at  the  known  dither  frequency:  ry  =  T  —  T.  The  covariance 

54The  standard  hypothesis  testing  algorithm  is  described  in  Section  2. 3. 3. 3  with  Equations  (2.31) 
through  (2.46). 

55An  ergodic  sequence  has  the  property  that  moments  such  as  the  mean  and  correlation  can  be 
computed  based  on  the  time  average  over  a  single  sequence  versus  over  a  set  of  realizations  —  a 
statistical  average.  Ergodic  sequences  are  a  subset  of  stationary  sequences. 
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matrix  of  the  estimated  PSD  (A^,)  is  used  in  place  of  the  filter-computed  residual 
covariance  matrix. 

The  RCKFB  employs  the  same  basic  testing  algorithm  but  with  a  different 
hypothesis  conditional  PDF  formed  by  exchanging  a  few  terms,  as  indicated  in  the 
previous  paragraph.  With  these  modifications,  it  is  an  MMAE  method  to  find  the 
large  PSD  content  at  the  known  frequency  of  the  dither.  The  drawback  of  the 
RCKFB  is  that  it  takes  slightly  longer  to  identify  the  failure,  but  the  advantage  is 
that  the  amplitude  of  the  purposeful  dither  signal  injected  into  the  flight  control 
actuators  can  be  subliminal.  Hence  a  combination  of  the  standard  MMAE  and  the 
MMAE  with  a  RCKFB  may  prove  useful. 

2. 7. 7  Modified  Multiple  Model  Adaptive  Estimation.  In  traditional  MMAE 
architectures,  the  designer  has  had  to  make  a  tradeoff  decision  between  an  optimal 
design  for  state  estimation  and  a  design  concerned  with  accurate  parameter  estima¬ 
tion  [130,  124],  Miller,  however,  developed  the  Modified  MMAE  (M3AE)  architecture 
[149]  that  exploits  the  benefits  of  an  MMAE  designed  for  accurate  parameter  esti¬ 
mation  and  that  performs  at  least  as  well  in  state  estimation  precision  as  an  MMAE 
designed  specifically  for  accurate  state  estimation.  This  architecture  offers  enhanced 
design  flexibility  in  optimizing  each  estimator  for  its  intended  purpose.  The  MMAE 
portion  of  the  M3AE  is  designed  for  parameter  estimation.  The  elemental  filters  of 
the  MMAE  are  designed  and  tuned  such  that  the  resulting  hypotheses  are  as  distin¬ 
guishable  as  possible  from  each  other.  This  increases  the  MMAE’s  ability  to  detect 
parameter  changes  in  the  system  accurately.  The  estimated  parameter  from  the  the 
standard  MMAE  framework  is  then  fed  into  a  single  Kalman  filter  designed  to  ac¬ 
cept  the  parameter  estimate.  This  filter  is  designed  for  accurate  state  estimation 
conditioned  on  the  measurements  and  knowledge  of  the  parameters  provided  by  the 
parameter  estimator.  Miller  [149,  148]  found  that  the  M3AE  performed  better  than 
the  standard  MMAE  when  blending  occurred  between  two  or  more  elemental  filters 
in  the  parameter  estimator  portion  of  the  M3AE.  Thus,  the  parameter  estimate  a 
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from  the  MMAE  within  the  M3AE  algorithm  would  be  different  from  (and  better 
than)  any  of  the  hypothesized  a  &  values  used  as  a  basis  for  that  MMAE’s  elemental 
filters.  To  ensure  that  the  parameter  estimate  is  a  good  blend  of  two  or  more  pa¬ 
rameter,  we  need  to  have  a  fairly  fine  discretization  of  the  parameter  set  so  that  a 
single  elemental  filter  does  not  absorb  all  of  the  probability  weight.  The  moving-bank 
MMAE  design  fits  this  scenario  quite  well  and  in  fact,  the  M3AE  doesn’t  typically 
provide  any  substantial  improvements  in  performance  without  a  moving  bank  of 
filters. 

2.8  A  Final  Note 

The  goal  of  this  background  chapter  was  to  prepare  the  reader  for  the  research 
in  the  subsequent  chapters.  We  have  satisfied  that  requirement,  however,  there  are 
a  plethora  of  advanced  techniques  and  applications  that  we  didn’t  have  space  to 
discuss.  The  interested  reader  is  encouraged  to  dive  into  the  literature  to  search  out 
those  advanced  techniques  in  order  to  extend  the  research  discussed  in  this  document. 
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III.  Infinite- Dimensional  Sampled-Data  Kalman  Filter 


3. 1  Introduction 

The  original  intent  of  multiple  model  adaptive  estimation  (MMAE)  [125]  was  to 
extend  the  applicability  of  the  linear  discrete-time  Kalman  filter  [95] 1  to  problems  in 
which  there  was  uncertainty  in  the  strength  of  the  dynamics  noise  model.  Subsequent 
work  extended  the  applicability  of  MMAE  to  the  other  parameters  characterizing 
the  structure  of  the  model  and  the  statistics  of  the  noise  models  [108,  129,  196,  132] 
as  reviewed  in  Chapter  II.  In  this  chapter  we  shall  extend  the  linear  sampled-data 
Kalman  filter  to  allow  an  infinite-dimensional  state  space  description  [38,  39,  40] 
thereby  creating  the  infinite- dimensional  sampled-data  Kalman  filter  (ISKF).  The 
ISKF  formulation  will  enable  us  to  apply  many  of  the  tools  of  Kalman  filtering  previ¬ 
ously  applied  to  finite-dimensional,  lumped  parameter  systems  described  by  a  vector 
stochastic  differential  equation  (SDE)  to  systems  best  described  with  distributed  pa¬ 
rameters  [163]  or  time-delayed  measurements,  using  a  stochastic  partial  differential 
equation  (SPDE)  [38,  87,  40]  or  a  stochastic  retarded  differential  equation  (SRDE) 
[102,  38,  18,  40,  103], 

It  is  well  known  that  the  Kalman  filter  optimally  combines  the  state  esti¬ 
mate  from  a  static  (sampled-data  measurement  update)  minimum  variance  unbiased 
(MVU)  estimator  with  state  predictions  based  on  a  presumed  dynamics  model  to 
estimate  the  state  recursively  [91,  129,  3,  29,  14],  The  development  of  our  ISKF 
extension  of  Kalman’s  filter  [95]  was  primarily  influenced  by  the  presentations  given 
by  McGarty  [141],  Luenberger  [122],  Catlin  [33],  and  Scharf  [170],  augmented  by  the 
infinite- dimensional  linear  systems  theory  reported  by  Curtain  and  Pritchard  [38] 
and  the  more  general  framework  of  stochastic  equations  in  infinite  dimensions  by 

1Kalman’s  seminal  work  [95]  has  been  republished  in  collections  edited  by  Sorenson  [183]  and 
Ba§ar  [12]. 


3-1 


Da  Prato  and  Zabczyk  [40].  Additionally,  most  of  the  linear  operator  and  transfor¬ 
mation  theory  was  gleaned  from  Naylor  and  Sell  [154]  as  well  as  an  introduction  to 
topics  in  probability  and  measure  theory  which  were  more  fully  studied  in  Grigoriu 
[66]  and  Royden  [166],  respectively.  A  recent  text  by  Bobrowski  [24]  combines  the 
study  of  functional  analysis,  probability  theory,  and  stochastic  processes;  his  work 
aided  this  author  in  combining  these  concepts  in  a  coherent  fashion  in  the  pages  that 
follow. 

The  following  section  briefly  discusses  several  topics  in  mathematical  and  func¬ 
tional  analysis  and  probability  theory.  Most  of  the  section  is  presented  as  definitions 
(with  a  few  lemmas  which  we  will  prove  since  they  are  not  common)  and  a  few  well- 
known  theorems  presented  without  proof.  Next,  we  develop  the  new  linear  infinite¬ 
dimensional  MVU  estimator  (LIMVUE)  and  then  we  generalize  the  LIMVUE  to  al¬ 
low  for  recursive  measurements.  Then,  we  present  an  infinite-dimensional  dynamics 
model  which  leads  us  to  the  new  ISKF.  We  conclude  this  chapter  with  the  generalized 
infinite- dimensional  multiple  model  adaptive  estimation  (GIMMAE)  framework. 

3.2  Preliminary  Concepts 

We  have  assumed  that  the  reader  is  familiar  with  the  abstract  mathematical 
concept  of  a  vector  space  and  in  particular  that  of  a  linear  vector  space,  hereafter 
referred  to  simply  as  a  linear  space.  However,  we  will  introduce  some  basic  definitions 
and  theorems  to  familiarize  the  reader  with  our  notation  and  conventions.  These 
results  are  well  known  and  are  thus  given  without  proof.  On  the  other  hand,  the 
proofs  in  the  following  sections  flow  more  smoothly  with  the  lemmas  that  we  propose 
and  prove  in  this  section.  The  order  in  which  we  present  the  following  concepts  is 
simply  as  we  need  them,  thus  dependent  concepts  are  presented  after  the  independent 
concepts;  e.g.,  an  inner  product  and  inner  product  space  are  defined  before  we  talk 
about  a  Hilbert  space. 
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Figure  3.1  Boxology  of  a  Transformation  or  Mapping. 

In  this  dissertation,  we  often  use  a  graphical  depiction  of  the  mappings  em¬ 
ployed  in  this  research,  referred  to  as  the  boxology  of  the  mapping  by  Oxley  [158]. 
These  boxology  diagrams  help  us  to  understand  the  relationships  between  the  inputs, 
transformations,  and  outputs,  and  in  particular,  these  diagrams  identify  the  spaces 
in  which  the  inputs,  transformations,  and  outputs  reside.  This  is  in  contrast  to  many 
control  system  and/or  circuit  diagrams  that  focus  on  the  input /output  or  transfer 
function  relationship  itself.  Figure  3.1  illustrates  a  simple  example  of  the  boxology 
employed  in  this  dissertation  and  the  following  definition  explains  the  notation. 

Definition  1  (Transformations  and  Operators)  Let  X  and  Y  be  vector  spaces 
over  the  same  field  (M  or  C) .  Let  T  :  X  — »  Y  represent  a  mapping  from  vector  space 
X  to  Y,  then  T  is  a  transformation  and  T (X,  Y)  denotes  the  set  of  transformations 
from  X  to  Y,  and  T  e  T(X,  Y);  this  space  of  transformations  is  graphically  depicted 
using  the  dashed  box  in  Figure  3.1.  If  T  :  Y  — >  Y  is  a  mapping  from  Y  to  itself,  then 
we  write  that  T  e  C?(Y),  i.e.,  T  is  an  operator. 

Now  we  shall  add  the  structure  of  linearity  to  our  mappings  in  the  following 
definitions. 

Definition  2  (Linear  Transformations  and  Operators)  Let  X  and  Y  be  vector 
spaces  over  the  same  field  (M  or  C)  and  let  a  be  a  scalar.  Let  L  :  X  — >  Y  represent 
a  mapping  from  vector  space  X  to  Y.  L  is  a  linear  transformation  if  [154]: 
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Figure  3.2  Boxology  of  a  Linear  Transformation. 

1.  Scalar  multiplication:  L( ax)  =  qL(x)  for  all  x  G  X  and  all  scalars  a,  and 

2.  Vector  addition:  L(x i  +  x2)  =  L(xi)  +  L(x2)  for  all  xi,  x2  G  X. 

Otherwise,  L  is  a  nonlinear  transformation.  Notationally,  CT (X,  Y)  denotes  the 
set  of  linear  transformations  from  X  to  Y,  and  L  G  £T(X,  Y);  this  relationship  is 
illustrated  in  Figure  3.2.  If  L  :  Y  — >■  Y  is  a  linear  mapping  from  Y  to  itself,  then  we 
write  that  L  G  £(9(Y),  i.e.,  L  is  a  linear  operator. 

A  linear  functional  is  a  special  type  of  linear  transformation  that  maps  the 
vector  space  over  a  scalar  held  to  that  scalar  held,  typically,  the  real  numbers,  M. 
We  use  several  linear  functionals  in  this  research;  the  inner  product,  defined  later  in 
this  section,  is  one  such  example  of  a  linear  functional. 

Definition  3  (Linear  Functional)  Let  Y  be  a  linear  space  over  1  orC.  A  linear 
functional  over  either  field  (M  or  C)  simply  maps  (or  transforms )  a  linear  space  to 
that  field  (M  or  C) .  [154] .  We  often  denote  the  linear  functional  over  M  using  the 
familiar  notation  for  a  function:  £(■),  where  the  value  £(y)  G  M,  for  any  y  G  Y  [158]. 

The  most  general  type  of  topological  space  that  we  will  employ  in  this  research 
is  a  metric  space;  for  our  purposes,  a  metric  space  is  defined  as  follows: 

Definition  4  (Metric  Space)  The  pair  (X,  d)  is  a  metric  space,  when  X  is  a  set 
and  d  is  a  real-valued  function,  a  “ distance ”  metric,  defined  for  x,  y  G  X  which 
adheres  to  the  following  axioms  [154]: 
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1.  Positive:  d(x,  y)  >  0  and  d(x,  x)  =  0  for  all  x,  y  G  X. 

2.  Strictly  positive:  If  d(x,  y)  =  0,  £/ien  x  =  y  for  all  x,  y  G  X. 

5.  Symmetry:  d(x,  y)  =  d(y,  x)  for  all  x,  y  G  X. 

Triangle  inequality:  d(x,  y)  <  d(x,  z)  +  d(z,  y)  for  all  x,  y,  z  G  X. 

Example.  Banach  and  Hilbert  spaces  are  examples  of  metric  spaces  with  the  function 
d  being  the  appropriate  norm  (used  as  the  metric)  of  the  difference  of  two  vectors. 

The  norm  of  the  difference  of  element  pairs  in  a  vector  space  is  an  example  of 
a  metric.  When  the  norm  is  paired  with  a  linear  vector  space,  it  forms  a  normed 
linear  space. 

Definition  5  (Normed  Linear  Space)  The  linear  space  Y  is  a  normed  linear 

space  if  there  is  a  real-valued  function,  ||  •  ||;  which  maps  each  y  G  Y  to  a  real 

number  ||y||  G  M.  The  norm  is  a  “ distance ”  metric  that  must  obey  the  following 
axioms  [154]: 

1.  Nonnegativity:  |jy||  >0  for  all  y  G  Y. 

2.  Positive  definiteness:  ||y||  =0  if  and  only  if  y  =  0. 

3.  Triangle  inequality:  ||yi  +  y2 1 1  <  jjyi||  +  1 1 y 2 1 1  for  each  yi,y2  G  Y. 

4-  Scalar  multiplication:  ||ay||  =  |a|  ||y||  for  all  scalars  aGR  and  each  y  G  Y. 

A  normed  linear  space  is  denoted  by  (Y,  1 1  •  1 1)  or  more  simply  as  Y  when  the  associated 
norm  is  understood. 

To  avoid  confusion  when  dealing  with  multiple  normed  linear  spaces  and  their  asso¬ 
ciated  norms,  we  often  add  a  subscript  to  the  norm  notation,  in  this  case,  a  normed 
linear  space  is  denoted  by  (Y,  ||  •  ||Y). 

Definition  6  (Bounded  Linear  Transformation)  Let  B  :  X  — >  Y  be  a  linear 
transformation.  When  X  and  Y  are  normed  linear  spaces,  then  the  linear  transfor- 
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mation  B  is  bounded,  if  there  is  a  real  number  M  >  0,  such  that 

||.Bx||y  <  M||x||x  for  all  x  G  X  (3.1) 

Let  BCT(fK,  Y)  denote  the  set  of  bounded  linear  transformations  (BLTs)  from  X  to 
Y ;  thus,  B  G  BCT  (X,  Y),  i.e.,  B  is  an  element  of  the  set  of  BLTs  from  X  to  Y  [154], 
When  B  is  an  operator,  i.e.,  when  Y  =  X,  then  we  use  the  notation  B  G  BCOfX) 
to  denote  that  B  is  a  bounded  linear  operator  (BLO). 

To  combine  vectors  from  different  spaces,  we  employ  the  direct  sum  of  the 
spaces  to  add  the  vector  components  so  that  the  new  vector  will  be  unique  [122,  154], 

Definition  7  (Direct  Sum)  Let  X  and  Y  be  linear  spaces  over  the  same  scalar 
field.  The  direct  sum  of  X  and  Y,  denoted  by  X  ©  Y,  forms  a  new  linear  space.  The 
underlying  set  of  X  ©  Y  is  formed  by  the  Cartesian  product,  X  x  Y,  of  sets  X  and  Y. 
A  point  in  X  x  Y,  is  an  ordered  pair  (x,  y),  where  x  G  X  and  y  G  Y.  Four  notable 
properties  of  the  direct  sum  are  [154]: 

XxY  X  Y 

1.  Vector  addition:  (xi,yi)  +  (x2,y2)  =  (xi  +  x2,yi  +  y2)  for  xi,x2  G  X  and 
yi,y-2  e  Y. 

2.  Scalar  multiplication:  a(x,  y)  =  (ax,  ay),  for  any  scalar  a,  and  every  x  G  X, 
and  y  G  Y. 

3.  Origin:  (0,  0)  G  X  x  Y. 

4-  Negative:  — (x,  y)  =  (— x,  — y)  for  every  x  G  X,  and  y  G  Y. 

Now  that  we  have  a  method  for  uniquely  combining  the  vectors  from  multiple 
normed  linear  spaces,  we  shall  ascribe  several  useful  properties  to  the  normed  linear 
spaces  considered  in  this  research,  beginning  with  what  it  means  for  a  sequence  of 
vectors  in  a  normed  linear  space  to  converge. 

Definition  8  (Convergence)  Let  (Y,  ||  •  ||)  be  a  normed  linear  space.  An  infinite 
sequence  of  vectors  {yi,y2, . . .}  C  Y  is  said  to  converge  to  a  vector  y  if  the  sequence 
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{I  |y  —  y 1 1 1,  |  |y  —  y2 1 !,  •  •  •}  °f  real  numbers  converges  to  zero  [154],  That  is,  given  a 
real  number  e  >  0  there  exists  an  integer  N  e  N  such  that 

I  |y«  —  y 1 1  <  e  (3.2) 


for  all  n  >  N. 


Definition  9  (Cauchy  Sequence)  Let  (Y,  jj  •  jj)  be  a  normed  linear  space.  A  se¬ 
quence  of  vectors  {yi,y2,...}  C  Y  is  called  a  Cauchy  sequence  if  the  sequence 
{||ym  —  yn|  |  :  m,n  G  N}  of  real  numbers  converges  to  zero,  that  is,  given  a  real 
number  e  >  0,  there  exists  an  integer  N  e  N,  where  N  =  {1,2,...}  is  the  set  of 
natural  numbers,  such  that 

||ym-y«||<e  (3.3) 


for  all  m,  n  >  N  [154], 

Remark  Note  that  every  convergent  sequence  in  a  normed  linear  space  is  a  Cauchy 
sequence  and  that,  in  general,  the  contrary  is  not  true. 


Definition  10  (Completeness)  The  normed  linear  space  Y  is  complete  if  every 
Cauchy  sequence  in  Y  is  a  convergent  sequence  in  Y  [154] . 

When  a  normed  linear  space  is  complete,  it  is  often  referred  to  by  the  special  name: 


Definition  11  (Banach  Space)  A  complete  normed  linear  space  is  also  called  a 
Banach  space  [154], 

Example.  An  important  Banach  space  for  our  research  is  the  space  Lf  of  Lebesgue 
measurable  functions2  for  1  <  p  <  oo  with  the  finite  norm  [154]: 


x  = 


|x(p)|pdp 


i/p 


(3.4) 


2To  be  precise,  the  equivalence  class  of  Lebesgue  measurable  functions  [154]. 
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The  Lebesgue  functions  are  those  functions,  x,  defined  on  a  closed  real  interval  [a,  b] 
such  that 

x(p)|pdp  <  oo  (3.5) 

i.e.,  functions  that  are  absolutely  integrable  (for  p  —  1),  square  integrablc  (for  p  =  2), 
etc. 

We  note  that  Banach  spaces  are  important  in  optimization  problems  (such  as 
Kalman  filtering)  because  the  property  of  being  a  Cauchy  sequence  provides  ns  a  way 
of  determining  if  a  sequence  of  vectors  in  the  Banach  space  does,  indeed,  converge  in 
the  space  even  when  the  limit  of  the  sequence  is  unknown  [122],  Another  important 
property  is  the  geometrical  structure  of  the  inner  product.  With  the  addition  of  this 
structure,  we  define  another  linear  vector  space,  the  inner  product  space.  In  general, 
an  inner  product  space  is  defined  over  a  scalar  field,  F.  Usually,  the  field  in  question 
is  either  the  complex  numbers,  C,  or  the  real  numbers,  M.  In  this  research,  we  focus 
on  an  inner  product  space  over  the  reals,  M. 

Definition  12  (Inner  Product  Space)  Let  Y  be  a  linear  space  over  M.  An  inner 

product  on  Y  over  M  is  a  mapping  that  associates  to  each  ordered  pair  of  vectors  yi 
and  y-2  a  real-valued  scalar,  denoted  by  (yi,y2)  that  satisfies  the  following  properties 
for  yi,y2,ys  e  Y  [154]: 

1.  Additivity:  (yi  +  y2,y3)  =  (yi,y3)  +  (y2,y3)- 

2.  Homogeneity:  (ayi,y2)  =  a(yi,y2)  for  every  a  G  M. 

3.  Symmetry:  (yi,y2)  =  (y2,yi). 

4 ■  Positive  definiteness:  (yi,yi)  >  0,  when  yi  ^  0. 

A  linear  space  Y  with  an  inner  product  (■,■),  written  as  (Y,  (•,  •))  is  known  as  an 
inner  product  space. 

Note  that  the  inner  product  generates  a  norm:  ||y||  =  (( y 1y))1^2  —  is  often 
called  the  induced  norm  on  Y.  If  we  have  an  inner  product  space  (Y,  (•,  •)),  we  often 
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abuse  the  notation  and  simply  write  Y  for  convenience.  When  it  is  uncertain  upon 
which  space  an  inner  product  is  defined,  we  will  subscript  the  right  angled  bracket 
with  the  name  of  the  space,  as  in:  (•,  -)Y. 

Example.  The  Euclidean  space  of  real- valued  N- vectors,  M.N ,  with  inner  product 
defined  as:  (x,  y)  =  xTy  =  ]Cn=i  xnl)m  for  x,  y  G  WLN ,  where  the  superscript  T 
represents  the  transpose  operation,  is  an  inner  product  space. 

From  the  geometrical  point  of  view,  the  inner  product  is  a  tool  for  comparing 
the  relative  “directions”  of  two  vectors.  When  two  vectors  are  completely  aligned, 
their  inner  product  is  maximized,  while  vectors  that  are  maximally  skewed  are  said 
to  be  perpendicular  or  orthogonal  when  their  inner  product  equals  zero. 

Definition  13  (Orthogonality)  Let  (Y,  (•,  •))  be  an  inner  product  space  over  M. 
The  vectors  yi  and  y2  are  said  to  be  orthogonal  if  their  inner  product  is  zero,  i.e., 
if  (yi,y2)  =  0,-  thus  we  write  yi  _L  y2,  where  the  perpendicular  symbol  _L  reflects  the 
geometrical  interpretation  [154],  Additio7ially,  if  A  andM  are  two  specific  subsets  of 
Y,  and  (a,  b)  =  0  for  every  a  e  A  and  b  e  B,  then  these  sets  are  orthogonal  and 
we  denote  this  by  A  1  B  [154],  Furthermore,  if  (A,  (•,  •))  and  (B,  (•,  •))  are  subspaces 
of  (Y,  (•,  •))  and  all  of  the  subsets  of  A  and  B  are  orthogonal,  then  we  say  that  the 
spaces  are  orthogonal  and  we  denote  this  relation  by  A  1  B  [154] . 

While  the  inner  product  of  two  vectors  (on  an  inner  product  space)  produces 
a  scalar,  the  outer  product  of  these  two  vectors  defines  a  linear  transformation. 
Some  authors  [37]  refer  to  the  operator  created  by  the  outer  product  as  a  rank-one 
operator;  this  is  analogous  to  the  creation  of  a  rank-one  matrix  by  the  outer  product 
of  finite- dimensional  vectors. 

Definition  14  (Outer  Product)  Let  X  and  Y  be  inner  product  spaces.  The  outer 
product  of  two  vectors  x  e  X  and  y  G  Y,  denoted  by  xoy,  is  defined  in  terms  of  the 
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inner  product  on  Y  as  [38,  40] 


(xoy)z  =  x(y,z),  for  every  z  E  Y  (3.6) 

The  outer  product  forms  a  linear  transformation,  that  is,  xo  y  E  CT (Y,  X)  [40] . 

Example.  Let  x  and  y  be  continuous  real- valued  functions  defined  on  the  interval  [a,  b] 
and  denoted  by  x,y  E  C([a,b]).  The  outer  product  is  defined  as:  (xoy)z  =  x(y,z ) 
for  all  z  E  C([a,b]).  Thus,  for  all  z  E  C([a,b]), 

[{xoy)z\{p)  =  x(p)  (  y{p')z{p')dp'  (3.7) 

J  a 

The  following  technical  lemma  treats  the  case  of  an  outer  product  when  the 
second  space  is  finite-dimensional: 

Lemma  15  Let  X  and  Y  =  RN  be  inner  product  spaces.  For  vectors  x  E  X  and 
y  E  Y,  the  outer  product  can  be  represented  as 

xoy  =  xyT  (3.8) 


Proof  of  Lemma  15  We  begin  by  using  the  definition  of  the  outer  product  such 
that  for  every  z  E  Y  the  following  relation  holds 


(xoy)z 


x(y,z) 

(3.9) 

x(yTz) 

(3.10) 

(xyT)z 

(3.11) 

Therefore,  Equation  (3.8)  follows  since  Equation  (3.11)  holds  for  all  z  G  Y.  ■ 

The  next  lemma  shows  that  the  outer  product  has  both  the  right  and  left 
distributive  property. 
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Lemma  16  Let  X  and  ¥  be  inner  product  spaces,  (i).  For  vectors  x  G  X  and 
yi,y2  G  Y,  the  right  distributive  property  for  the  outer  product  is 

x«(yi+y2)  =  (xoyi)  +  (xoy2)  (3.12) 

(ii).  For  vectors  xi,x2  G  X  arid  y  G  Y,  the  left  distributive  property  for  the  outer 
product  is 

(x1+x2)oy=(xiOy)  +  (x2oy)  (3.13) 

Remark  Vector  outer  product  takes  precedence  over  vector  addition;  hence  the 
parentheses  on  the  right-hand  sides  of  Equations  (3.12)  and  (3.13),  while  unnecessary, 
are  included  for  added  clarity. 

Proof  of  Lemma  16  (i).  Begin  by  using  the  definition  of  the  outer  product  for 
arbitrary  x  G  X  and  yi,  y2,  z  G  Y. 


xo(yi+y2)]z  =  x(yi  +  y2,z) 

(3.14) 

=  x((yi,z)  +  (y2,z)) 

(3.15) 

=  x(yi,z)  +x(y2,z) 

(3.16) 

Using  the  definition  of  the  outer  product  on  the  right-hand  side  of  the  equation  again 
yields 

[x  o  (y2  +  y2)]z  =  (xoyi)z  +  (x«y2)z  (3.17) 

=  [(x  o  yi)  +  (x  o  y2)]z  (3.18) 

where  the  last  equality  follows  because  we  have  two  linear  transformations  applied 
to  the  same  vector.  Since  Equation  (3.18)  holds  for  all  z  G  Y,  then  we  get 

xo(yi  +  y2)  =  (x«yi)  +  (xoy2)  (3.12) 


3-11 


Thus  property  (i)  of  Lemma  16  holds. 

(ii).  Begin  by  using  the  definition  of  the  outer  product  for  arbitrary  x1;x2  G  X  and 

y.z  G  Y. 


[(x!+x2)oy]z  =  (xi  +  x2)(y,z)  (3.19) 

=  X!(y,z)  +x2(y,z)  (3.20) 

Then  we  use  the  definition  of  the  outer  product  again  to  get 

[(xi  +  x2)  o  y]z  =  (xioy)z  +  (x2oy)z  (3.21) 

=  [(xjoy)  +  (x2oy)]z  (3.22) 

where  once  again,  the  last  equality  follows  because  we  have  two  linear  transforma¬ 
tions  applied  to  the  same  vector.  Since  Equation  (3.22)  holds  for  all  z  G  Y,  then 

(xi  +x2)«y  =  (xjoy)  +  (x2oy)  (3.13) 

Thus  property  (ii)  of  the  Lemma  16  holds.  ■ 

For  the  next  lemma,  we  will  first  introduce  the  concept  of  an  adjoint  operator. 

Definition  17  (Adjoint)  Let  X  and  Y  be  inner  product  spaces  over  the  same  field. 
If  B  G  £>£T(X,  Y),  then  the  adjoint  of  B,  denoted  B* ,  is  also  a  BLT,  B*  :  Y  — >  X, 
defined  on  X  and  Y  such  that  the  following  holds  true  for  all  B*  G  BCT( Y,  X), 
x  G  X,  and  y  G  Y 

(5x,y)Y  =  (x,Fy)x  (3.23) 

Remark  The  adjoint  of  B  satisfies  the  property:  ||L?*||  =  ||L?||  [154], 

Lemma  18  Let  U,  V,  X,  and  Y  be  inner  product  spaces  over  the  same  field.  For 
vectors  x  G  X  and  y  G  Y,  and  linear  transformations  A  G  £T(X,  U)  and  B  G 
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CT{  Y,V) 


(Ax)o(By)  =  A(xoy)B 


(3.24) 


where  B*  is  the  adjoint  of  B. 

Proof  of  Lemma  18  The  outer  product  (Ax)  o  (By)  is  defined  by 


[(Ax)  o  (B y)]z  =  Ax(By,  z)  for  all  z  e  V  (3.25) 

Note  that  (By,  z)  =  (y,  B*z).  Thus,  for  all  z  e  V, 

[(Ax)  o  (By)]z  =  Ax(y,B*z)  (3.26) 

=  A(xoy)B*z  (3.27) 

Since  Equation  (3.27)  holds  for  every  z  e  V,  we  get 

(Ax)o(By)  =  A(xoy)B*  (3.24) 


Therefore,  the  lemma  holds.  ■ 

As  with  the  norrned  linear  space,  when  we  add  completeness  to  the  topology 
of  an  inner  product  space,  we  give  it  a  special  name. 

Definition  19  (Hilbert  Space)  A  complete  inner  product  space  is  also  called  a 
Hilbert  space  [154], 

Example.  An  important  Hilbert  space  that  we  employ  in  our  research  is  the  space 
of  real- valued  Lebesgue  measurable  and  square  integrable  functions  over  the  interval 
[a,  b],  denoted  by  L?  , , ,  as  defined  on  page  3-7,  with  the  inner  product  [122] 

(x=y)iL ?  „  =  /  *(p)y(p)dp  (3.28) 

[“,!>]  Ja 
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Now  that  we  have  defined  the  Hilbert  space,  is  there  another  topological  prop¬ 
erty  that  we  desire?  Yes,  we  desire  the  topological  property  of  separability.  The 
separability  of  a  metric  space  (which  includes  the  Hilbert  space)  says  that  we  can 
arbitrarily  approximate  any  point  in  the  space  using  a  countable  orthonormal  basis 
[154],  Thus,  we  are  interested  in  the  separable  Hilbert  space  so  that  we  can  arbi¬ 
trarily  approximate  any  point  in  the  space  using  an  orthonormal  basis.  But  before 
we  define  the  separable  Hilbert  space,  we  shall  introduce  a  few  other  topics:  the 
orthonormal  set  and  the  maximal  orthonormal  set. 

Definition  20  (Orthonormal  Set)  Let  (Y,  (•,  •))  be  an  inner  product  space.  The 
set,  {y^  G  Y  :  i  G  N},  is  called  an  orthonormal  set  if  every  pair  of  elements  satisfies 
(y iiYj)  —  $ij  for  all  hJ  £  N,  where  Sij  is  the  Kronecker  delta  function  [154], 

Definition  21  (Maximal  Orthonormal  Set)  Let  (Y,  (•,•))  be  an  inner  product 
space.  An  orthojiormal  set  B  =  {/3i,  fo,  •  •  •}  is  called  a  maximal  orthonormal  set  if 
there  is  no  unit  vector  y  G  Y  such  that  B  U  {y }  is  an  orthonormal  set  [154], 

Definition  22  (Separable  Hilbert  Space)  Let  (HI,  (*,  •))  be  a  Hilbert  space.  If  ev¬ 
ery  orthonormal  set  ( spanning  set )  is  countable  and  there  is  a  maximal  orthonormal 
set,  then  (HI,  (•,  •))  is  called  a  separable  Hilbert  space. 

Using  these  definitions,  we  can  now  propose  a  useful  tool  for  representing  a 
vector  as  a  series  of  vectors. 

Theorem  23  (Fourier  Series)  If  the  set  {/ifi,  /32, . . .}  is  any  maximal  orthonormal 
set  and  y  G  HI,  then  y  can  be  expressed  by  the  Fourier  series  as  [154,  166]: 

OO 

y  =  Qinfli  (3.29) 

n= 1 

where  an  =  (y,/3n)  is  the  nth  Fourier  series  coefficient.  Moreover,  ||y||"  = 
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Proof  of  Theorem  23  See  Naylor  and  Sell  [154], 

The  next  five  operator  and/or  transformation  properties  are  defined  to  help  us 
describe  the  types  of  operators  that  the  infinite-dimensional  trace  operator  may  act 
upon  to  produce  a  useful  scalar;  these  are  the  trace-class  operators  [36,  37,  165,  40]. 
The  trace  operator  is  of  central  importance  in  an  identity  used  to  relate  the  covariance 
or  correlation  (which  is  defined  using  an  outer  product)  to  a  particular  inner  product. 
With  that  in  mind,  the  first  operator  property  that  we  discuss  is  a  useful  extension 
of  a  familiar  symmetry  property  of  complex  square  matrices.  We  say  that  M  is 
a  Hermitian  (symmetric)  matrix  if  and  only  if  M*  =  M  €  CNxN ,  where  *  is  the 
conjugate  transpose;  thus,  mpq  =  rn*p  for  every  element  rnpq  of  M.  A  generalization 
to  a  Hilbert  space  follows. 

Definition  24  (Self-Adjoint  Operator)  Let  B  be  a  BLO  on  a  Hilbert  space.  If 
B*  =  B,  then  B  is  said  to  be  a  self-adjoint  operator  [154], 

A  slightly  weaker  property  for  (possibly  unbounded)  operators  is  given  by  the 
following. 

Definition  25  (Symmetric  Operator)  Let  L  be  a  linear  operator  on  a  dense  sub¬ 
space  in  an  inner  product  space  (Y,  (•,  •)),  i.e.,  the  domain,  T>(L),  is  a  dense  set  in 
Y.  L  is  said  to  be  a  symmetric  operator  if  for  every  x,  y  e  £>(L),  (Lx,  y)  =  (x,  Ly) 
[39], 

Remark  If  either  the  domain  of  the  adjoint  operator,  V(L *),  satisfies  T>(L*)  =  T>(L), 
or  if  L  is  bounded,  then  a  symmetric  operator  is  also  self-adjoint  [39]. 

The  following  lemma  demonstrates  that  the  outer  product  operator  is  symmet¬ 
ric  under  certain  circumstances. 

Lemma  26  Let  (HI,  (•,•))  be  a  real  Hilbert  space.  For  every  vector  u  e  HI,  the  outer 
product  uou  is  a  symmetric  linear  operator. 
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Proof  of  Lemma  26  We  shall  show  that  L  =  u  o  u  obeys  (Lx,  y)  =  (x,  Ly)  for 
every  u  E  1.  Let  x,y  E  H 


((u  0  u)x,  y)  =  (u(u,  x),  y) 

(3.30) 

=  (u,x)(u,y) 

(3.31) 

=  (u,  y)(u,  x) 

(3.32) 

where  the  first  line  employed  the  definition  of  the  outer  product. 

(u,  y)  back  into  (u,  x)  yields 

Now  reinserting 

((u  o  u)x,  y)  =  (u(u,  y),  x) 

(3.33) 

=  ((uou)y,x) 

(3.34) 

=  (x,  (uou)y) 

(3.35) 

Thus,  the  outer  product  operator,  u  o  u  is  a  symmetric  operator.  ■ 

Just  like  symmetric  square  matrices  (and  the  set  of  real  numbers),  operators 
may  possess  the  property  of  positiveness. 

Definition  27  (Positive  Operator)  A  self-adjoint  BLO  B  on  a  Hilbert  space  El 
is  positive  if 

(x,  Bx)  >  0  (3.36) 

for  all  x  G  El;  this  is  denoted  by  B  >  0.  Similarly,  the  operator  is  strictly  positive  if 

(x,  Bx)  >  0  (3.37) 

for  all  x  G  El;  this  is  denoted  by  B  >  0  [154], 

Remark  Note  that  the  word  “positive”  is  not  used  altogether  consistently  when 
applied  to  operators  and  matrices  by  many  authors  and  in  this  research.  For  example, 
a  real- valued  matrix  D  G  MMxM  is  termed  positive  semi-definite  when  the  inner 


3-16 


product  (x,  Dx)  is  nonnegative  for  every  vector  x  e  Mm  and  positive  definite  when 
the  inner  product  is  also  nonzero,  for  all  nonzero  vectors  in  Mm  [189],  i.e.,  a  strictly 
positive  scalar. 

The  next  two  properties  apply  to  transformations  (as  well  as  operators).  In 
terms  of  its  spectral  properties,  a  compact  transformation  is  almost  as  simple  as  a 
matrix  [154],  Recall  that  the  eigenvalues  of  a  matrix  describe  its  spectrum  and  that 
the  trace  of  a  matrix  is  the  sum  of  those  eigenvalues.  Hence  it  seems  reasonable 
that  a  transformation  that  is  “like”  a  matrix  would  be  “traceable.”  Additionally, 
the  set  of  nuclear  transformations,  which  includes  the  covariance  and  correlation 
transformations,  is  a  subset  of  compact  transformations. 

Definition  28  (Compact  Transformation)  Let  X  and  Y  be  two  Banach  spaces 
over  the  same  field.  Let  L  :  X  — ■>  Y  be  a  linear  transformation  that  maps  X  to  Y.  L 
is  said  to  be  a  compact  transformation  if  L(D)  lies  in  a  compact  (or  closed )  subset 
of  Y,  where  D  =  (x  6  X  :  ||x||  <  1}  [154], 

Remark  Compactness  is  also  referred  to  as  complete  continuity  of  a  linear  transfor¬ 
mation  [43,  216]. 

Definition  29  (Nuclear  Transformation)  Let  X  and  Y  be  Banach  spaces  over 
the  same  field,  X*  be  the  dual  space  of  X,  i.e.,  the  space  of  continuous  linear  func¬ 
tionals  defined  on  the  space,  and  BCT (X,  Y)  be  a  Banach  space  of  BLTs  from  X 
into  Y.  A  transformation  B  e  £>£T(X,  Y)  is  said  to  be  nuclear  if  there  exist  two 
sequences  {au,  . . .}  C  X*  and  {yi,  y2, . . .}  C  Y  such  that  [165,  40] 

Ml '  INI  <  00  (3-38) 

3 

and  B  is  defined  by 

B(x)  =  ^2  yj  ai(x)>  x  G  X  (3.39) 

3 
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Remarks  (1)  The  space  of  all  nuclear  transformations  (NT),  from  X  to  Y,  is  a 
Banach  space,  denoted  J\fT (X,  Y)  with  the  following  norm  [40] 

\\b\Ut  = inf  jx Mf '  :  fi(x)  =  Xyj^(x)J  (3-40) 

(2)  Since  all  nuclear  transformations  are  compact  [216]  and  all  compact  transforma¬ 
tions  are  bounded  and  linear  [154],  nuclear  transformations  are  necessarily  compact, 
bounded  and  linear. 

Lemma  30  Let  u  G  U  and  v,  w  G  V  be  vectors  in  separable  Hilbert  spaces  of 
Lebesgue  L2  functions.  The  outer  product  u  o  v  is  a  nuclear  transformation  from 
V  to  U. 

Proof  of  Lemma  30  From  Definition  14,  we  know  that  the  outer  product  uov  is 
defined  by  the  relation  (uov)w  =  u(v,  w)  for  every  w  G  V.  This  matches  the  form 
(rather  trivially)  for  transformation  B  in  Equation  (3.39).  Since  we  have  chosen 
Hilbert  spaces  of  Lebesgue  L2  functions,  ||u||  •  (v,  w)  is  finite  since  both  terms  are 
finite.  Hence,  the  outer  product  from  one  separable  Hilbert  space  of  Lebesgue  L2 
functions  to  another  creates  a  nuclear  transformation.  ■ 

The  trace  of  a  matrix  is  the  sum  of  the  diagonal  elements  [129].  A  deeper  look 
shows  that  the  trace  of  the  matrix  is  equal  to  the  sum  of  the  eigenvalues  [189].  The 
only  criterion  placed  on  the  matrix  is  that  it  be  square.  To  extend  this  concept  to  a 
Hilbert  space,  we  must  add  more  constraints  beyond  the  fact  that  a  square  matrix 
corresponds  to  an  operator.  The  trace  operator  may  only  be  applied  to  trace-class 
operators.  Since  nuclear  operators  are  the  primary  trace-class  operators  of  interest 
to  us  in  this  research,  we  will  forego  a  more  in-depth  discussion  of  the  trace-class 
operators,  see  for  example,  references  [37,  165,  40],  for  an  extensive  development. 

Definition  31  (Trace  Operator)  (i).  Let  L  be  a  self-adjoint  operator  [154],  a 
compact  positive  operator  [37],  or  a  nuclear  operator  [40]  on  a  Hilbert  space  HI;  then 
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the  trace  of  L  is  given  by 


(3.41) 


tr  L  —  Y"  (/./!„.  ;i„ ) 

n 

where  {/3i ,  /?2,  •  •  •}  2s  ony  orthonormal  basis  on  HI. 

(ii).  If  L  is  a  compact  self-adjoint  positive  operator  [154]  on  HI,  then  the  trace  of  L 
is  given  by 

trL  =  J^An  (3.42) 

n 

where  { Ai,  A2, . . .}  is  the  set  of  eigenvalues  of  L. 

Remark  A  positive  BLO  defined  on  a  Banach  space  is  nuclear  if  and  only  if  the 
trace  of  the  operator  is  finite  [40]. 

The  linear  space  of  real  IV-vectors,  x,  y  €  M.N ,  with  associated  inner  product 
defined  as:  (x, y)  =  xxy  and  outer  product:  xoy  =  xy1  is  a  separable  Hilbert 
space.  It  is  common  knowledge  that  the  trace  of  the  outer  product  is  equal  to  the 
inner  product3:  tr(xyT)  =  xxy.  In  the  next  lemma,  we  extend  this  trace  operator 
property  for  the  case  of  (possibly)  infinite-dimensional  vectors  in  a  separable  Hilbert 
space. 

Lemma  32  Let  (HI,  (•,•))  be  a  separable  Hilbert  space  of  Lebesgue  L2  functions.  The 
trace  of  the  outer  product,  tr(xoy),  is  equal  to  the  inner  product  (x,  y),  for  any 
x,  y  G  HI,  i.e. 

tr(xoy)  =  (x,y)  (3.43) 

Proof  of  Lemma  32  From  Lemma  30,  we  may  employ  Definition  31,  to  obtain,  for 
every  x,  y  G  HI, 

tr(xoy)  =  J^((xoy)/3n,/3n)  (3.44) 

n 

3This  property  of  the  trace  is  simply  a  specific  case  of  a  more  general  result  [27]:  tr(AB)  = 
tr(BA)  for  appropriately  dimensioned  matrices  A  and  B. 
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where  {/3\,  02, . . .}  is  any  orthonormal  basis.  Using  the  definition  of  the  outer  prod¬ 
uct,  we  can  rewrite  Equation  (3.44)  as 

tr(x  o  y)  =  J^(x(y,  /3n),pn)  (3.45) 

n 

Factoring  out  the  inner  product  yields 

tr(xoy)  =  J^(x,/3n)(y,/3n)  (3.46) 

n 

=  ^2  xnyn  (3.47) 

n 

=  (x,y)  (3.48) 

where  we  note  that  x  and  y  can  be  decomposed  using  the  same  orthonormal  ba¬ 
sis  {/3i,/? 2,  •  •  •},  so  that  xn  =  (x,  (3n)  and  yn  =  (y ,/3n),  respectively.  Finally,  we 
recognize  that  the  sum  in  the  third  line  is  just  the  inner  product.  ■ 

For  finite-dimensional  vectors  x,  y  €  ,  we  know  that  x  and  y  are  orthogonal, 

by  definition,  whenever  their  inner  product  is  zero:  x2y  =  0.  Additionally,  it  is  true 
that  whenever  their  outer  product  is  zero,  i.e.,  xy1  =  0  G  WNxN ,  that  their  inner 
product  is  also  zero  (x1y  =  0  G  M)  and  therefore  x  and  y  are  orthogonal.  Observe 
that,  since  the  trace  of  the  zero  outer  product  is  trivially  zero,  the  inner  product  is 
necessarily  zero  as  well.  In  the  following  theorem,  we  extend  this  notion  of  using 
an  outer  product  of  vectors  to  identify  the  (geometrical)  orthogonality  of  vectors  in 
a  more  general  setting  of  a  separable  Hilbert  space  of  Lebesgue  L2  functions.  Note 
that  this  theorem  helps  us  to  preposition  —  in  a  natural  fashion  —  the  indispensable 
concept  of  statistical  orthogonality  that  will  be  discussed  in  Definition  49. 

Theorem  33  (Orthogonal  Vectors)  Let  (H,  (•,•))  be  a  separable  Hilbert  space  of 
Lebesgue  L2  functions.  Any  two  vectors,  x, y  G  H,  are  orthogonal,  i.e.,  (x, y)  =  0, 
whenever 

x  o  y  =  0  G  CO (M)  (3.49) 
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Proof  of  Theorem  33  Applying  the  trace  operator  to  both  sides  of  Equation 

(3.49) 

yields 

tr(x  oy)  =  tr(0) 

(3.50) 

=>  tr((x,y))  =  0 

(3.51) 

=>  <x,y)  =  0 

(3.52) 

where  the  second  line  follows  since  the  trace  of  the  outer  product  (defined  on  this 
separable  Hilbert  space  of  Lebesgue  L2  functions)  is  equal  to  the  trace  of  the  inner 
product  per  Lemma  32  and  the  third  lines  gives  the  obvious  result  that  the  trace  of 
a  scalar  is  that  scalar.  Thus  x  o  y  =  0  implies  that  x  1  y  for  x,  y  £  H.  ■ 

Since  the  topological  concept  of  a  closed  subspace  is  important  to  the  Projec¬ 
tion  Theorem,  we  will  define  it  before  we  proceed  further. 

Definition  34  (Closed  Subspace)  If  (X,  d)  and  (Y,  d)  are  metric  spaces  such  that 
X  C  Y,  then  we  call  (X,  d)  a  subspace  of  (Y,  d) .  The  subspace  (X,  d)  is  closed  if  and 
only  if  every  convergent  sequence  of  vectors  {aq,  aq,  •  •  •}  in  X  has  its  limit  in  X. 

The  following  two  theorems  are  well-known  and  are  thus  stated  without  proof. 
The  first  theorem  is  a  prelude  to  the  projection  theorem  and  it  establishes  the  unique¬ 
ness  of  a  vector  that  produces  an  error  vector  that  is  orthogonal  to  the  subspace  of 
interest. 

Theorem  35  Let  Y  be  an  inner  product  space  ( not  necessarily  complete),  §  a  sub¬ 
space  of  Y,  and  y  an  arbitrary  vector  in  Y.  If  there  is  a  vector  s0  e  §  such  that 
||y  -  s0|i  —  ||y  -  s||  for  all  s  e  §,  then  s0  is  unique.  A  necessary  and  sufficient 
condition  that  s0  be  a  unique  minimizing  vector  in  §  is  that  the  error  vector  y  —  s0 
be  orthogonal  to  §  [122], 

Theorem  36  (Classical  Projection  Theorem)  Let  HI  be  a  Hilbert  space  and  §  a 
closed  subspace  of  HI.  Corresponding  to  any  vector  y  G  HI,  there  is  a  unique  vector 
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Figure  3.3  An  Illustration  of  the  Projection  Theorem. 


s0  e  §  such  that  ||y  —  s0||  <  ||y  —  s||  for  all  s  G  §.  Furthermore,  a  necessary  and 
sufficient  condition  that  s0  G  §  be  the  unique  minimizing  vector  is  that  the  error 
vector  y  —  s0  be  orthogonal  to  §  [122] . 

Example.  Let  El  =  {(x,y)\x,y  G  M}  be  the  Euclidean  2-space  (M2)  with  the  usual 
inner  product  —  the  dot  product.  Let  §  =  {(x,  y)\x,  y  G  M,  y  =  0}  be  the  x-axis  and 
(xs,0)  be  an  arbitrary  vector  (or  a  point  in  this  case)  in  §.  We  can  easily  visualize 
that  any  point  in  x-y  plane,  e.g.  point  (xo,  yo)  hr  Figure  3.3,  corresponds  to  a  unique 
point  (xo,  0)  on  x-axis  that  is  found  by  projecting  onto  the  x-axis.  The  error  vector 
between  an  arbitrary  vector  in  El  and  its  corresponding  vector  in  §  is  given  by: 
(x0,  yo)  ~  (xs,  0)  =  (x0  -  xs,y0  -  0)  =  (x0  -  xs,yQ).  We  find  xs  by  minimizing  the 
norm  of  the  error;  which  in  this  example,  is  given  by  the  square  root  of  the  sum 
of  the  squares:  [(x0  —  xs)2  +  yl)}1^2 ■  By  inspection,  we  can  see  that  the  norm  of 
the  error  is  minimized  when  xs  =  Xq.  Thus,  the  optimal  point  on  the  x-axis,  which 
minimizes  the  norm  of  the  error  is  (xo,  0)  as  shown  in  Figure  3.3.  Hence,  the  error 
vector  is:  {x0  —  xs,y0)\Xs=X0  —  (0, 2/o) -  Next,  we  use  the  dot  product  to  show  that  the 
error  vector,  (0,  yo),  is  orthogonal  to  the  optimal  projection  of  point  (xq,  yo)  onto  §: 
(xo,  0).  Therefore,  (0,  yo)  ■  (xq,  0)  =  0  •  xq  +  yo  •  0  =  0.  Q.E.D. 
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Figure  3.4  Boxology  of  a  Random  Vector. 

For  the  static  state  estimation  problem,  we  consider  a  Hilbert  space  of  (possibly 
infinite-dimensional)  random  vectors4  x  G  X  =  $(f2,  X)  and  a  related  linear  space 
given  by  Z  =  Z)  for  the  single  observation  case.  The  notation  $(•,  •)  denotes 
a  linear  space  of  functions,  H  is  a  non-empty  set  called  the  sample  space,  and  X  is 
the  state  space,  i.e.,  the  space  of  realizations  of  x,  and  Z  is  the  observation  space. 
Using  the  boxology  previously  employed,  we  can  show  graphically  that  for  each 
experiment,  an  u  is  “chosen1'  from  the  sample  space  hi;  this  choice  dictates  which 
Y- valued  random  vector  (or  function)  in  Y  represents  the  state  as  shown  in  Figure 
3.4.  Thus,  Figure  3.4  illustrates  how  the  probability  space,  random  vector  space, 
and  realization  space  are  interrelated.  We  use  the  tilde  above  Y  to  help  us  associate 
the  set  of  functions  that  map  points  (outcomes)  in  the  sample  space  to  vectors  in  the 
space  of  realizations.  The  remainder  of  the  notation  in  the  figure  will  be  explained 
in  the  following  definitions  on  the  next  few  pages.  Refer  back  to  Figure  3.4  to  see 
how  the  concepts  are  related. 

Recall  that  our  goal  in  this  research  is  to  solve  an  optimization  problem  in 
order  to  find  the  “best”  state  estimate,  x,  which  minimizes  the  variance  between  the 
state  estimate  and  the  true  state  (i.e.,  the  state  estimation  error)  for  a  given  mea¬ 
surement  z  G  Z.  To  accomplish  this,  we  need  to  define  the  probability  space,  random 

4In  the  language  of  vector  spaces,  every  element  in  a  vector  space  is  called  a  vector.  Thus,  these 
random  vectors  may  be  random  variables,  random  functions,  or  random  matrices. 
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vector  space,  expected  value,  covariance,  and  conditional  expectation,  among  other 
mathematical  and  probability  theory  topics.  First,  we  will  discuss  a  few  technical 
topics  from  measure  theory  that  lead  up  to  a  definition  of  the  probability  space. 

Definition  37  (u-Algebra  and  a-Field)  Let  S  be  a  nonempty  set.  A  cr-algebra 

on  S  is  a  collection  of  subsets  of  S  such  that  the  necessary  subsets  in  that  collection 
are  the  set  itself,  the  empty  set,  the  complements  of  all  the  members  in  the  collection 
and  all  countable  unions  of  members.  A  <r -field  is  also  called  a  cr-algebra  [25]. 

Definition  38  (Borel  Sets)  For  set  A,  the  collection  fB(A)  of  Borel  sets  is  the 
smallest  cr-algebra  which  contains  all  of  the  open  subsets  of  A  [166]. 

Definition  39  (Measurable  Space  and  Measurable  Set)  A  measurable  space 
is  a  pair  (X,  £/),  consisting  of  nonempty  set  X  and  cr-algebra  G  of  subsets  of  X.  A 
subset  ¥  ofXis  called  measurable  (or  measurable  with  respect  to  G)  if  ¥  e  Q.  [166]. 

Definition  40  (Measure  and  Measure  Space)  Let  X  be  a  nonempty  set  and  G 
be  a  cr-algebra  defined  on  X.  A  measure  p  on  the  measurable  space  (X,  G)  is  a 
nonnegative  set  function  defined  for  all  sets  in  G  and  satisfying  p(0)  =  0  G  1,  where 
0  is  the  empty  or  null  set,  and 


for  any  sequence  Ei,  E2, ...  of  disjoint  measurable  sets.  A  measure  space  (X,  G,  p)  is 
a  triplet  formed  by  a  measurable  space  (X,  G)  with  a  measure  p  defined  on  G ■  [166]. 

If  the  measure  of  X  is  one,  i.e.,  p(X)  =  1,  and  0  <  p( Y)  <  1  for  every  Y  in  G , 
then  p  may  be  called  a  probability  measure.  This  leads  us  to  define  our  probability 
space  and  other  associated  properties  and  terminology. 
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Definition  41  (Probability  Space  and  Expectation)  Suppose  that  the  triplet 
(12,  .A,  P)  is  a  complete  probability  space5,  where  12  is  a  non-empty  set  called  the 
sample  space,  T  is  a  a -field  which  consists  of  a  collection  of  subsets  of  12,  called 
events,  and  P  is  a  probability  measure,  a  mapping  that  assigns  a  real  number  be¬ 
tween  zero  and  one  ( inclusive )  to  every  event  in  P,  with  the  probability  of  the  sure 
event  P(  12)  =  1.  Let  Y  be  a  Banach  space.  A  Y -valued  random  variable  is  a  map 
y  :  12  — >  Y  which  is  strongly  measurable 6  with  respect  to  the  probability  measure  P. 
If  y  is  integrable  (in  the  sense  of  Bochner)  on  12,  we  define  the  expectation  operator, 
E,  with  the  integral  expression  [38] 

E( y)  =  /  y  dP  =  I  y(co)  P(du)  (3.53) 

Jn  Jn 

(Note  that  y  is  said  to  be  Bochner  integrable  if  fQ  ||y||  dP  <  oo  [40].)  The  random 
vector  y  induces  a  measure  Py  on  23(Y),  the  Borel  sets  of  Y,  which  is  defined  as 

Py( A)  =  P{cv  :  y (u)  G  A}  (3.54) 

for  A  G  23 (Y),  and  thus  (Y.  23(Y),Py)  is  also  a  complete  probability  space.  An 
equivalent  way  of  expressing  Equation  (3.53)  using  the  probability  measure  Py  is  [33] 

E(y)  =  [  ydPy  (3.55) 

J  Y 

Remark  The  expectation  operator  is  often  subscripted  with  the  pertinent  random 
vector,  as  in  Ey( y).  Additionally,  we  often  denote  the  mean  of  a  random  vector  by 
py  =  Ey(y). 

5  A  probability  space  (12,  T ,  P)  is  complete  if  for  every  set  AcB  such  that  B  £  T  and  P(B)  =  0 
we  have  A  G  T  so  that  P( A)  =  0  [66]. 

6Strong  and  weak  measurability  concepts  are  identical  for  separable  Hilbert  spaces  [38]. 
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Definition  42  (Joint  Probability  and  Expectation)  Let  the  triplet  (Ll,E,  P) 
be  a  complete  probability  space.  Let  X  and  Y  be  Banach  spaces.  The  joint  probabil¬ 
ity  measure,  Pxy,  induced  by  random  vectors  x  and  y  on  a  collection  of  the  events 
described  by  a  relationship  between  the  Borel  sets  23  (X)  and  23  (Y),  is  defined  as 

Px  y(A,B)  =  P{uj  :  x(ca)  G  A  and  y (u>)  E  B}  (3.56) 

for  A  E  23 (X)  and  B  E  23 (Y).  The  joint  expectation  of  g(x,  y)  is  then 

£X)yb(x,y)]=  [  g(x,  y)  dP  =  [  [  g(x,y)dPx.y  (3.57) 

Jn  Jx  J y 

where  g  is  a  Baire  function,  i.e.,  a  continuous  function  or  the  point-wise  limit  of  a 
continuous  function  [129,  21], 

Next,  as  a  continuation  of  Lemma  18,  given  on  page  3-13,  we  find  the  expected 
value  of  an  outer  product  of  random  vectors. 

Lemma  43  Let  U,  V,  X,  and  Y  be  Hilbert  spaces  of  random  vectors.  For  random 
vectors  x  E  X  and  y  G  Y,  and  BLTs  A  E  BCT(X,  U)  and  B  E  BCT{ Y,V)  , 

Ex,y[(Ax)  o  (By)}  =  AEXy(xoy)B*  (3.58) 

where  B*  is  the  adjoint  of  B  and  Ex_ y  is  the  joint  expectation  operator. 

Proof  of  Lemma  43  From  Lemma  18,  for  random  vectors  we  have 

(Ax)  o  (By)  =  A(xoy)B*  (3.59) 

Taking  the  expectation  of  both  sides  yields 

EXy[(Ax)  o  (By)]  =  EXy[A(x  o  y)B*]  (3.60) 
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Since  A  and  B*  are  nonrandom  BLTs,  the  expectation  commutes  with  A  and  B* 
and  we  can  pull  them  outside  of  the  expectation,  i.e.,  A  and  B*  are  not  a  function 
of  uj  and  thus  they  can  be  factored  out  of  the  integral  defining  the  expectation  in 
Equation  (3.53)  on  page  3-25  and  thus 

£x,ypx)  o  (Py)]  =  AEx,y(xoy)B*  (3.58) 


Therefore,  the  lemma  holds.  ■ 

The  separable  Hilbert  spaces  of  Lebesgue  L1  and  L2  functions  are  the  two  most 
important  spaces  that  we  use  to  form  our  random  vector  spaces  in  this  research.  We 
give  them  here  in  the  following  example  for  1  <  p  <  oo. 

Example.  Let  X  be  a  separable  Hilbert  space.  The  notation  IP(fl,  P;  X)  denotes  the 
separable  Hilbert  space  of  Lebesgue  IP  functions  that  are  measurable  with  respect 
to  P;  these  functions  map  the  sample  space  H  to  the  realization  space  X,  hence  x  is 
an  X-valued  random  vector.  The  IP  norm  is  [154] 


lL  p 


=  [£(«)]1/’’  = 


X  (UJ) 


■  dP{u) 


IJn 


i  /p 


(3.61) 


and  ||x(o;)||x  can  be  evaluated  using  Equation  (3.4)  when  X  =  Lf  6j.  If  X  is  an 
TV-dimensional  Euclidian  space  lw  and  p  —  2,  then  the  two  norm  is  written  as: 

llxMllx  =  x'TMx(P). 

For  finite- dimensional  problems,  we  can  refine  our  definition  of  the  expectation 
operator  in  Equation  (3.53)  if  we  introduce  the  concept  of  a  probability  distribution 
function. 


Definition  44  (Probability  Distribution  Function)  The  probability  distribu¬ 
tion  function  for  a  ¥  =  RN-valued  random  variable,  y  :  fl  — »  Y,  on  the  complete 
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probability  space  (Ll,F,  P),  is  given  as  [66] 


^(y)  =  P  (y  ^(-oo,^))  =  P({w  :  y(u)  <  y})  =  P( y  <  y)  (3.62) 

where  the  inequality  is  taken  element  by  element  for  the  case  of  a  multi- dimensional 
vector.  At  the  extremes,  F(— oo)  =  0  and  F(o o)  =  1. 

Example.  If  we  let  Y  =  M,  then  Equation  (3.62)  is  the  accumulation  of  probability 
for  y(uj)  values  less  than  or  equal  to  y. 

The  following  definitions  lay  out  a  series  of  properties  pertaining  to  random 
vectors.  For  example,  we  are  interested  in  the  covariance  operator  —  an  extension 
of  the  familiar  covariance  matrix  for  infinite-dimensional  systems.  Since  our  work  in 
this  dissertation  primarily  uses  separable  Hilbert  spaces  (of  random  vectors),  some 
of  the  following  definitions  and  results  apply  only  to  separable  Hilbert  spaces. 

Note  that  in  the  following  pair  of  “second  moment”  definitions,  we  restrict  our 
attention  to  a  separable  Hilbert  space  of  Lebesgue  L2  functions.  Thus,  the  covariance 
and  correlation  operators  (as  well  as  the  cross-covariance  and  cross-correlation  trans¬ 
formations)  will  be  bounded  —  and  hence  continuous  —  operators  (transformations) 
since  Lebesgue  L2  functions  are  absolutely  square  integrablc.  Thus,  while  references 
[38,  40]  stipulate  that  the  covariance  operator  is  symmetric,  the  covariance  operator 
formed  using  Lebesgue  L2  functions  creates  a  bounded  symmetric  operator;  hence 
the  covariance  operator  is  self-adjoint  as  noted  in  the  remark  following  the  definition 
of  a  symmetric  operator. 

Definition  45  (Covariance  and  Cross-Covariance)  Let  X  be  a  separable 
Hilbert  space.  The  covariance  operator  for  an  X-valued  random  vector,  x  e  X  = 
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L2(4i,.P;X)  is  defined  by  [38,  40] 


Sfx) 


A 


£[(x  -  Mx)  o  (x  -  Mx)] 

(3.63) 

/  [(x  -  Mx)  o  (x  -  Mx)]  dp 

(3.64) 

Jn 

/  [(x  -  Mx)  o  (x*-  Mx)]  dpx 

Jx 

(3.65) 

where  /ix  =  E(x)  and  Px  is  a  probability  measure  induced  by  random  vector  x.  The 
covariance  operator  is  self-adjoint,  positive,  and  nuclear  on  X. 

Similarly,  for  random  variables  x  G  X  =  L2(f2,  P;  X)  and  y  G  Y  =  L2(ff,  P;  Y)  , 
the  cross-covariance  transformation  is  defined  by 


£(x,y)  =  ^[(x  -  Mx)  o  (y  My)]  (3.66) 

=  [  [(x  -  Mx)  o  (y  -  My )]dP  (3.67) 

Jn 

=  [  [  [(x  -  Mx)  o  (y  -  My)]  dPx,y  (3.68) 

JX  J ¥ 


where  ny  =  P(y)  mid  Px  y  is  a  probability  measure  induced  jointly  by  random  vectors 
x  and  y.  Additionally,  x  and  y  are  sazd  to  6e  uncorrelated  whenever  E(x.y)  =  0. 

Remarks  (1)  For  the  special  case  in  which  x,  y  €  X  =  L2(h2,P;X),  the  cross¬ 
covariance,  while  neither  self-adjoint  nor  symmetric,  is  now  a  nuclear  operator  [40]. 
(2)  Since  the  covariance  operator  is  nuclear,  it  is  also  bounded  and  linear  and  is 
thus  a  BLO;  hence,  E(x)  G  BCOfK).  Note  that  (BCO(K),  ||  •  j|)  is  a  Banach  space, 
with  the  operator  norm,  provided  that  X  is  a  complete  nornred  linear  space  [154], 
The  cross-covariance  transformation  is  also  bounded  and  lives  in  a  Banach  space  of 
BLTs,  i.e.,  E(x,y)  G  BCT{ Y,X). 

Next,  we  will  use  the  above  definitions  for  the  covariance  operator  and  the 
cross-covariance  transformation  to  define  the  non-central  second  moments:  the  cor¬ 
relation  operator  and  the  cross-correlation  transformation. 
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Definition  46  (Correlation  and  Cross-Correlation)  Let  X  be  a  separable 
Hilbert  space.  The  correlation  operator  for  an  X-valued  random  vector,  x  G  X  = 
L2(ff,P;X),  which  is  self-adjoint,  positive,  and  nuclear,  is  defined  by 


E(x)  =  P(x  ox) 

=  /  (x  o  x)  dP 


=  /  (x  ❖  x)  dPx 


(3.69) 

(3.70) 

(3.71) 


where,  as  we  saw  before,  Px  is  a  probability  measure  induced  by  random  vector  x. 

Similarly,  for  random  variables  x  G  X  =  L2(fl,  P;  X)  and  y  G  Y  =  L2(fl,  P;  Y), 
the  cross-correlation  transformation  is  defined  by 


E(x,y)  =  P(xoy) 

=  [  (x  o  y)  dP 


f  [  (xoy  )dPx,y 
JX  J Y 


(3.72) 

(3.73) 

(3.74) 


where  Px  y  is  a  probability  measure  induced  jointly  by  random  vectors  x  arid  y. 

Remarks  (1)  For  the  special  case  in  which  x,  y  G  X  =  L 2(fi,P;X),  the  cross¬ 
correlation  defined  above,  while  neither  self-adjoint  nor  symmetric,  is  now  a  nuclear 
operator.  (2)  Since  the  covariance  operator  is  nuclear,  it  is  also  bounded  and  linear 
and  is  thus  a  BLO;  thus  it  follows  that  the  correlation  operator  is  also  a  BLO,  hence, 
E(x)  G  BCO(X).  The  cross-correlation  transformation  is  also  bounded  and  lives  in 
a  Banach  space  of  BLTs,  i.e.,  3(x,y)  G  BCT( Y,  X). 

An  extremely  useful  identity  for  the  covariance,  S(x),  of  random  vector  x  that 
we  will  use  several  times  in  this  chapter  is  given  next. 
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Lemma  47  If  x  G  X  =  L 2(f2,P;X)  is  an  X-valued  random  vector  for  separable 
Hilbert  space  X,  then  [40] 

tr[E(x)]  =  E{(x  —  /ix,  x  —  nx)}  (3.75) 

where  /ix  is  the  mean  of  random  vector  x. 

Proof  of  Lemma  47  Begin  with  the  definition  for  the  covariance  and  then  by 
continuity  interchange  the  linear  operations  for  the  trace  [37]  and  expectation  [154] 
to  get 


tr[£(x) 


tr{P[(x-/ix)o(x  — /ix)]} 

(3.76) 

£{tr[(x  -  nx)  o  (x  —  fix)]} 

(3.77) 

P{(x-/ix,x-^x)} 

(3.78) 

where  line  three  follows  from  Lemma  32.  ■ 

As  a  corollary  to  Lemma  47,  the  identity  can  be  extended  for  the  cross¬ 
covariance  operator  S(x,  y),  where  x,  y  G  X  =  L2(f2,  P;  X). 

Corollary  48  //x,  y  G  X  =  Lr(f2,  P;  X)  is  an  X-valued  random  vector  on  a  separable 
Hilbert  space,  then 

tr[E(x,y)]  =  P{(x-/rx,y-My)}  (3-79) 

where  /ix  is  the  mean  of  random  vector  x  and  /iy  is  the  mean  of  random  vector  y. 

For  the  special  case  of  finite-dimensional  random  vectors,  two  vectors  are 
termed  statistically  orthogonal  whenever  their  cross-correlation  matrix  is  the  zero 
matrix  [129],  i.e.,  whenever  3(x, y)  =  P(xyr)  =  0.  To  extend  this  concept  for 
infinite- dimensional  systems,  we  propose  the  following  definition: 

Definition  49  (Statistically  Orthogonal  Vectors)  Let  x  G  X  =  L2(f2,P;X) 
and  y  G  Y  =  L2(f2,P;Y)  be  random  vectors  with  separable  Hilbert  spaces  X  and 
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Y,  respectively.  Any  two  vectors,  x  G  X  and  y  G  Y,  are  statistically  orthogonal 
whenever  their  cross- correlation  is  the  zero  transformation,  i.e., 

H(x,y)  =  E(xoy)  =  0  G  £T(Y,X)  (3.80) 

Remark  If  either  x  or  y  is  a  zero-mean  random  vector,  then  x  and  y  are  statistically 
orthogonal  whenever  x  and  y  are  either  independent  or  uncorrelated. 

A  Hilbert  space- valued  random  vector  can  be  uniquely  specified  by  its  charac¬ 
teristic  functional  [38]. 

Definition  50  (Characteristic  Functional)  Consider  a  Hilbert  space-valued  ran¬ 
dom  vector  x  G  X  =  L  1(Q,P;X)?  with  the  induced  probability  measure,  Px.  Define 
the  mapping  yx  :  X  — >  M  to  be  the  characteristic  functional  for  random  vector  x  and 
j  =  \/— T  by  [38,  40],  where 

Xx(0  -  £x[exp (j(x,£))]=  f  exp(j(x,  £))  dPx  (3.81) 

Jx 

for  all  £  G  X. 

Later  in  our  development  of  the  ISKF,  we  will  need  to  ascribe  the  Gaussian 
property  to  a  random  vector  of  interest,  and  subsequently  to  a  stochastic  process. 

Definition  51  (Gaussian  Random  Vector,  Gaussian  Measure)  The  charac¬ 
teristic  functional  for  a  Hilbert  space-valued  random  vector  x  G  X  =  L  1(12,P;X), 
with  Gaussian  probability  measure  Px,  is  given  by  [38] 

Xx(0  =  exp  [j(vx,0  -  I(£x£,£)]  (3.82) 

where  pix  G  X  is  the  mean  and  the  covariance  operator  £x  ,  with  the  modified  notation, 
which  was  previously  defined  in  Definition  f5,  is  positive,  self-adjoint,  and  nuclear. 
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There  are  several  different  types  of  convergence  for  random  vectors:  a  sequence 
of  random  vectors  can  converge  in  probability,  in  mean  square,  with  probability  one, 
or  in  distribution  [38].  If  (f2,  T ’,  P)  is  a  complete  probability  space  and  Y  is  a  Banach 
space,  then  convergence  of  a  sequence  of  Y-valued  random  vectors  in  L2(f2,  P;  Y)  is 
called  mean  square  convergence  [66];  it  is  defined  next.  The  remaining  forms  of 
convergence  will  not  be  addressed  further  in  this  research. 

Definition  52  (Mean  Square  Convergence)  Let  Y  be  a  Banach  space.  A  se¬ 
quence  (y1,  y2, .  •  •}  of  Y -valued  random  vectors  converges  to  y  in  mean  square  sense 
if  [38,  66] 

P(||yn  —  y| |y)  — ►  0  as  n  — >  oo  (3.83) 

Definition  53  (Independence)  Let  the  triplet  (0,  P,  P)  be  a  complete  probability 
space. 

(i).  Events  Ai,  A2, . . . ,  An  e  P  are  independent  if  [45,  24] 

(n  \  n 

rv<  =npw  <3-84) 

i= 1  /  i= 1 

(ii).  Let  y,j  :  — >  Yt  be  a  random  vector  with  Borel  set  03(Y*)  for  Banach  space  Y*. 

Random  vectors  yl5  y2, . . . ,  yn,  are  independent  if  [45] 

(n  \  n 

p|{yi  e  A(}  Un^eAi)  (3.85) 

i= 1  /  i= 1 

for  all  A i  G  ®(Yj)  for  each  i  =  1,2, ... ,  n.  Furthermore,  both  of  these  definitions 
can  be  extended  to  countably  infinite  number  of  objects,  whether  events  or  random 
vectors.  A  countably  infinite  number  of  objects  forms  an  independent  set  if  every 
finite  subcollection  is  an  independent  set  [45]. 
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(iii) .  Let  Y  be  a  Banach  space.  Y -valued  random  vectors  yx  and  y2  are  independent 
if  {cu  :  y^u;)  G  Ai}  and  {co  :  y2(Y)  G  A2}  are  independent  sets  in  the  cr -field  T  for 
any  Borel  sets  A1;  A2  G  53 (Y)  [38,  45]. 

(iv) .  Now  let  Y  be  a  separable  Hilbert  space.  Ify1  and  y2  are  in  Ll{fil,P]  Y)  and  are 
independent,  then  they  are  also  uncorrelated  [38]: 

^yi,y2((yi’  V2)y)  =  (^Yi(yi),^y2(y2))Y  (3.86) 

The  MVU  estimator,  that  we  will  develop  in  this  chapter,  is  designed  to  ex¬ 
ploit  the  statistical  relationship  between  the  observations  and  the  states  using  the 
conditional  expectation  operator.  The  conditioning  is  accomplished  relative  to  a  sub 
u-held  of  the  observations  cr-field. 

Definition  54  (Sub  a-Field)  Let  the  triplet  (0,  T,  P)  be  a  probability  space.  Let 
S  be  a  subcollection  of  the  a -field  T ,  i.e.,  S  C  T .  We  call  S  a  sub  a-field  if  S  is 
also  a  cr-field  of  Q  [154], 

Thus,  we  see  that  a  sub  a-field  S  may  only  give  us  partial  (or  incomplete)  knowledge 
of  the  a-field  T .  If  the  sub  a-field  S  is  “equivalent”  to  the  a-field  T ,  except  for  a 
finite  collection  of  sets  of  measure  zero,  then  we  gain  complete  knowledge,  and  our 
estimate  using  S  becomes  as  good  as  using  T  since  we  will  be  taking  an  expectation 
of  the  state  given  the  sure  event. 

Definition  55  (Conditional  Expectation)  Let  the  triplet  (Q,  T ,  P)  be  a  com¬ 
plete  probability  space.  Let  X  be  a  Banach  space.  The  conditional  expectation  of 
an  X-valued  random  vector  x  relative  to  sub  a-field  S,  denoted  by  £(x|5),  is  defined 
by  the  relation  [38,  154,  33,  40,  45] 

/  x  dP  =  [  S(x\S)dP  for  oil  A  G  5  (3.87) 

J  A  JA 
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Since  this  conditional  expectation  creates  a  random  vector,  we  use  a  different  sym¬ 
bol  {£)  for  the  expectation  operator.  Note  that  £(x|5)  is  uniquely  defined  by  this 
relationship  and  must  be  measurable  relative  to  the  sub  a -field  S. 

Remark  When  Equation  (3.87)  is  evaluated  for  a  specific  event,  Ai  e  S,  then  we 
write 

E(x |Ai)=  [  £{x\ S)dP  (3.88) 

J  Ai 

Now  we  shall  adapt  this  definition  for  conditional  expectation  for  our  eventual 
estimation  purposes  in  the  following  example. 

Example.  Let  the  triplet  (fi,  T ,  P )  be  a  complete  probability  space  and  let 
(X,  03 (X),  Py)  and  (Z,  03 (Z),  Pz)  be  separable  Hilbert  spaces  of  X-  and  Z- valued  ran¬ 
dom  vectors,  respectively.  We  condition  the  expectation  on  the  Borcl  sets  03 (Z),  a 
sub-cr-field  of  T ,  i.e.,  03 (Z)  C  T ,  and  then  write  Equation  (3.87)  as 


The  boxology  for  this  conditional  expectation  is  shown  in  Figure  3.5.  Note  that  Y 
is  a  subspace  of  X  and  the  range  of  y  is  a  subset  of  X,  i.e.,  77(y)  C  X;  thus  we  would 
not  expect  the  conditional  mean  estimator  y  to  produce  an  estimate  of  x  that  is 
equal  to  x  =  x(ca)  since  y  maps  vectors  in  Z  to  a  subspace  of  X. 

Consider  the  special  case  where  x  is  measurable  relative  to  the  cr-field  03  (Z), 
sometimes  written  as  x  G  03 (Z)  [45],  then  £[x|03(Z)]  =  x,  i.e.,  x  is  already  the  best 
guess  for  x  [45,  38].  At  the  other  extreme,  consider  when  x  is  independent  of  03  (Z), 
then  knowing  03 (Z)  does  not  change  the  expectation,  hence  £[x|03(Z)]  =  E(x)  [45]. 

We  often  write  £(x|z)  in  place  of  the  rigorous  notation  £[x|03(Z)].  Additionally, 
we  may  add  a  subscript  to  the  conditional  expectation  operator  £  when  needed  for 
clarity,  e.g.,  the  conditional  expectation  of  the  sum  of  random  vectors  x  +  w  with 
respect  to  random  vector  x  given  z  is  written  as  £x(x  +  w|z),  where  w  is  some 
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(V,F,P) 


Figure  3.5  Boxology  of  a  Conditional  Expectation.  Note  that  Q3(Z)  c  T  and  Y  is 
a  subspace  of  X. 

W-valued  random  vector.  When  the  conditional  expectation  £  is  evaluated  for  a 
particular  event  or  realization  of  the  sub  <r- field  or  conditional  random  vector,  e.g., 
z  G  03  (Z)  or  z(uj)  =  z,  then  we  no  longer  create  a  random  vector  with  the  conditional 
expectation  operation  (as  noted  in  the  remark  following  the  definition)  and  thus  the 
notations  E{x |z  e  03(Z)}  and  E[x\z(co)  =  z]  for  the  realizations  of  £{x|03(Z)}  and 
£[x|z]. 


Definition  56  (Conditional  Covariance)  Let  X  be  a  separable  Hilbert  space. 
The  conditional  covariance  operator  for  an  X-valued  random  variable,  x  e  X  = 
L2(fl,P;X)  relative  to  sub  a -field  S,  denoted  by  S(x|«S),  is  given  by 

S(x|«S)  =  £{[x-£(x|5)]o[x-f(x|5)]|5}  (3.90) 

=  f{[x-£{x\S)]o[x-£{x\S)]\S}dP  for  all  A  e  S  (3.91) 

J  A 
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Remark  For  a  specific  event,  A  e  S,  the  conditional  covariance  is  no  longer  a 
random  quantity;  it  is  given  by: 

E(x|A  eS)=  f  {[x-£(x\S)]o[x-£(x\S)]\S}dP  (3.92) 

J  A 

Since  onr  random  vectors  are  allowed  to  evolve  or  change  over  time,  we  must 
define  what  is  meant  by  a  stochastic  process  to  account  for  the  time-varying  nature. 

Definition  57  (Stochastic  Process)  Let  the  triplet  (fi,  T ,  P)  be  a  complete  prob¬ 
ability  space.  A  stochastic  process  is  a  family  of  random  vectors,  given  as  (x(f)  :  t  G 
T},  that  maps  the  product  space  T  x  to  the  realization  space  X  for  each  fixed  f  6  T, 
where  T  is  a  set  used  to  index  the  random  vectors  [141,  21].  The  stochastic  process  is 
discrete  if  T  is  countable  and  continuous  if  T  is  homeomorphic  to  M  or  some  interval 
on  M.  The  stochastic  process  is  denoted  by  either  x  or  x(-,  •).  Furthermore,  both  x(t) 
and  x(f,  •)  are  used  to  denote  a  random  vector,  while  x(t,u)  =  x(f)  is  a  realization 
of  the  random  vector  x(t,  •)  and  x(-,u)  =  x(cn)  is  a  sample  of  the  stochastic  process 
x. 

Definition  58  (Covariance  Kernel)  LetH  be  a  separable  Hilbert  space  andx(t)  E 
X  =  L2(fl,  P;  X)  be  an  X-valued  random  vector  for  each  f  E  T.  The  covariance  kernel 
is  an  operator  on  a  stochastic  process  (x(f)  :  f  6  T}  defined  by 

E[x(f),x(r)]  =  E{[x(t)  -  Mx(*)]  o  [x(t)  —  MxM]}  (3-93) 

=  /{[XW  -  MxW]o[x(r)  ~nx(r)]}dP  (3.94) 

Jn 

=  [  {[XW  -  MxO)]  o  [x(r)  -  MX(T)]}  dPx(t)  (3.95) 

Jx 

for  t,r  G  T,  where  fxx(t)  =  E[x(t)],  hx(t)  =  P[x(r)],  and  Px(p  is  a  probability 
measure  induced  by  random  vector  x(t). 
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The  models  used  by  the  estimators  in  this  research  can,  in  theory,  be  driven 
by  most  any  noise  process,  provided  that  we  know  (at  least)  the  first  two  moments. 
However,  filter  synthesis  may  prove  exceedingly  difficult  (and  the  resulting  filter 
snboptimal)  unless  the  discrete-time  model  is  driven  by  a  Gaussian  process.  Note 
that  the  Wiener  (or  Brownian  motion)  process  we  use  to  drive  the  continuous-time 
model  has  independent  increments  that  are  Gaussian.  The  following  definitions  will 
describe  Gaussian,  independent  increment,  and  finally,  Wiener  (or  Brownian  motion) 
processes. 

Definition  59  (Gaussian  Process)  Let  the  triplet  (hi,  T ,  P)  be  a  complete  prob¬ 
ability  space.  An  R-valued  stochastic  process,  (x(f)  :  t  e  T},  is  called  a  Gaussian 
process  if  every  finite  collection  of  random  vectors  is  jointly  Gaussian  [42] . 

Definition  60  (Independent  Increment  Process)  Let  the  triplet  (fl,  T ',  P)  be  a 

complete  probability  space  and  (x(f)  :  t  G  T}  be  a  stochastic  process,  where  T  is 
a  discrete  set  such  that  0  —  t0  <  ti  <  ■  ■  ■  <  tf.  An  increment  is  defined  by  the 
difference  of  two  random  vectors  as  [x(£j)  —  x(t,)],  where  f  <  tj  for  i  <  j.  If  the 
disjoint  increments  [x(tj)  —  x(t*)]  and  [x(f;)  —  x(tfc)]  are  independent,  i.e.,  Definition 
53  is  satisfied  for  every  disjoint  pair  of  increments  and  without  loss  of  generality, 
we  have:  t0  <  t,  <  tj  <  4  <  h  <  tf ,  then  the  process  is  called  an  independent 
increment  process. 

Many  texts  on  probability  theory,  stochastic  processes,  and  filtering  theory 
devote  an  entire  section  (or  chapter)  to  the  Wiener  (or  Brownian  motion7)  process 
[42,  91,  141,  129,  21,  40,  66,  45,  24],  Falb  defined  a  Wiener  process  in  his  development 
of  the  Kalman-Bucy  filter  on  a  Hilbert  space  [51].  We  will  follow  Curtain  and 
Pritchard’s  [38]  abstract  presentation  in  the  following  definition  for  a  Wiener  process 
or  Brownian  motion. 

7The  nineteenth  century  botanist  Robert  Brown  studied  the  random  thermal  motion  of  grain 
particles  suspended  in  a  fluid  and  Norbert  Wiener  developed  the  mathematical  foundation  for  this 
type  of  random  motion  [209,  21]. 
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Definition  61  (Wiener  Process  or  Brownian  Motion)  Let  the  triplet  (Q,JF,P) 
be  a  complete  probability  space.  The  X-valued  stochastic  process  (b(t)  :  t  G  [0,t/]} 
is  called  a  Wiener  process  (or  Brownian  motion)8  [38]  on  [0,  tf]  if  it  is  an  X-valued 
process  on  [0,  tf],  such  that  [b(£)  —  b(s)]  G  L 2(12,P;X)  for  all  s,t  G  [0,  tf]  and 

1.  E[b(t)-b(s)]  =0 

2.  S[b(t)  —  b(s)]  =  (t  —  s) Q,  where  the  constant  Q  G  BCOifK)  and  is  symmetric, 
positive,  and  nuclear 

3.  [b(s4)  —  b(s3)]  and  [b(s2)  —  b(si)]  are  independent  whenever  0  <  si  <  s2  < 
s3  <  s4  <  tf 

4 ■  b(t)  has  continuous  sample  paths  on  [0,  tf] 

Additionally,  the  increment  [b(i)  —  b(s)]  is  Gaussian  distributed  with  zero  mean  and 
covariance  £[b(i)  — b(s)]  =  (t—s) Q,  where  the  constant  Q  is  often  called  the  diffusion 
of  this  constant- diffusion  process  featuring  independent  increments. 

The  Wiener  process  can  be  further  generalized  to  include  a  time-varying  dif¬ 
fusion  [129] 

S[b(t)  —  b(s)]  =  /  Q(r)dr,  t  >  s  (3.96) 

J  S 

but  we  will  not  use  this  generality  in  the  sequel  since  our  problems  of  interest  do  not 
require  this  property. 

Definition  62  (Discrete-Time  White  Noise  Process)  Let  the  triplet  P) 

be  a  complete  probability  space.  A  discrete-time  white  noise  process,  n(-,  •),  is  de¬ 
fined  as  a  collection  of  zero-mean  independent  random  vectors  (n(tj)  :  f  G  T}  with 

8We  pay  homage  to  Brown  by  denoting  our  Wiener  process  with  the  Arabic  letter  b,  for  Brownian 
motion. 
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covariance  kernel  operator 


S[n(ij),  n(tj) 


Js[n(ti)],  U  =  tj 

[o,  ti  ^  tj 


(3.97) 


where  £[n(fj)]  is  positive  (or  positive  semi-definite  when  the  covariance  takes  the 
form  of  a  matrix )  and  the  cross-covariance  between  noise  vectors  at  different  times 
is  zero  since  random  vectors  taken  from  a  white  noise  process  are  independent  in 
time  and  thus  uncorrelated. 

Remarks  (1)  Per  Definition  45,  the  covariance  operator  is  only  guaranteed  to  be 
positive.  (2)  However,  if  we  assume  that  there  is  “noise”  in  every  dimension,  i.e., 
that  there  are  no  zero  eigenvalues,  then  the  noise  covariance  is  a  strictly  positive 
operator  (or  positive  definite  matrix).  This  “noise  in  every  channel”  assumption 
then  gives  rise  to  a  definition  that  provides  a  sufficient  condition  for  guaranteeing 
that  the  covariance  operator  (or  matrix)  for  the  measurements  is  invertible. 

In  the  next  section,  we  will  begin  to  develop  the  tools  needed  to  create  the 
ISKF,  but  first,  note  that: 


1.  All  of  the  random  vectors  in  the  remainder  of  the  chapter  are  based  on  the  fact 
that  the  triplet  (D,JF,  P)  is  a  complete  probability  space  as  described  in  this 
section  and  specifically  in  Definition  41.  For  the  sake  of  brevity,  this  statement 
will  not  be  included  in  any  of  the  definitions  or  theorems  in  the  following 
sections,  unless  it  is  needed  for  clarity. 

2.  We  have  generally  tried  to  avoid  subscripting  expectation  operators  to  provide 
a  cleaner  look.  The  expectation  is  to  be  taken  in  a  joint  sense  when  there  is 
more  than  one  random  vector  involved  and  will  be  specifically  noted  when  this 
is  not  the  case.  For  example,  we  use  E(x)  for  Ex(x)  and  E(x  +  y)  for  Exy(x  +  y), 
while  Ex(x  +  y)  will  retain  the  subscripting. 
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3.3  Linear  Infinite- Dimensional  Minimum  Variance  Unbiased  Estimator 

There  are  two  main  classes  of  statistical  signal  processing  techniques:  the 
Fisher  or  classical  approach,  in  which  the  signal  (or  parameter)  of  interest  is  as¬ 
sumed  to  be  deterministic,  yet  unknown,  and  Bayesian,  in  which  the  signal  (or 
parameter)  of  interest  is  random  [100,  14].  Kalman  applied  the  Bayesian  approach 
to  estimation  theory  [95].  Classical  methods  include  techniques  to  produce  least 
squares  (LS)  and  maximum  likelihood  (ML)  estimators.  The  Bayesian  counterparts 
to  these  methods  are  the  minimum  mean-squared  error  (MMSE)  and  maximum  a 
posteriori  (MAP)  estimators,  respectively.  An  unbiased  estimator  that  best  min¬ 
imizes  the  variance  of  the  error  is  called  the  MVLI  estimator;  this  terminology  is 
ordinarily  associated  with  classical  estimators  [100];  however,  some  authors  [33]  use 
MVLI  as  a  synonym  for  MMSE  estimators,  and  others  [122]  apply  the  terminology 
of  MVLI  estimation  to  both  classes  of  estimators.  In  this  work,  we  will  define  the 
MVU  estimator  and  employ  it  as  a  Bayesian  estimation  technique.  A  benefit  of  the 
Bayesian  approach  is  that  prior  information  is  easily  incorporated  by  the  estimator, 
whereas  classical  techniques  do  not  lend  themselves  as  effortlessly  to  the  admission 
of  prior  information.  We  can  however,  disguise  the  prior  information  in  the  form 
of  a  previous  measurement  output  to  achieve  a  similar  effect.  Luenberger  [122]  and 
Catlin  [33]  have  derived  MVU  estimators  for  finite-dimensional  models. 

In  this  section,  we  give  a  series  of  definitions,  lemmas,  and  theorems  as  we 
build  up  the  machinery  to  derive  several  estimators.  First,  we  create  a  linear  infinite¬ 
dimensional  MVU  estimator  (LIMVUE)  for  the  case  of  correlated  states  and  obser¬ 
vations  (CSO)  and  then  follow  with  a  more  specific  LIMVUE  theorem  to  extend  the 
applicability  of  finite- dimensional  MVLI  estimators  to  infinite-dimensional  problems 
using  a  technique  often  employed  to  find  linear  MMSE  estimators.  Before  getting 
started  with  our  LIMVUE  theorems,  we  shall  define  several  terms  used  in  describing 
of  Bayesian  estimation.  While  there  is  more  than  one  way  to  define  an  estimator, 
the  following  definition  serves  to  support  this  research. 
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Definition  63  (Estimator,  Estimate)  Let  x  €  X  be  the  state  and  z  e  Z  be  a 

measurement.  An  estimator  of  random  vector  x  is  defined  as  a  random  vector  [33] 

x  =  g  o  z  (3.98) 

where  g  is  a  Baire  function  and  o  is  the  composition  operator.  When  we  are  given 
measurement  z(u>)  =  z,  then  an  estimate  of  x  is 

=  ff(z(w))  =  g(  z)  (3.99) 

T/wts,  an  estimate  is  a  realization  of  the  estimator. 

Remark  Scharf  [170]  and  others  call  the  state  estimator  g  o  z  a  statistic,  i.e. ,  a 
function  of  one  or  more  random  vectors  that  does  not  depend  on  any  unknown 
parameters  and  a  sufficient  statistic  if  g  o  z  carries  all  of  the  information  about  the 
data  z  [86].  In  our  work,  the  statistic  may  well  be  a  transformation  applied  to  a 
function. 

Definition  64  (Unbiased  Estimate)  Let  x  be  an  estimator  of  random  vector  x. 
Whenever  EXjZ(x)  =  Ex(x),  we  say  that  the  estimator  produces  an  unbiased  estimate 

[14]- 

Theorem  65  (Conditional  Mean  Estimator)  Let  (X,  (•,  •)-)  be  a  separable 
Hilbert  space  of  X-valued  random  vectors,  and  x  e  X  be  a  random  vector  called 
the  state.  Let  (Z,  (•,  •)-)  be  a  Hilbert  space  of  Z-valued  random  vectors,  and  z  e  Z 
be  a  random  vector  called  the  measurement.  Then,  the  conditional  (state)  estimator 
is  given  by 

x  =  £(x|z)  (3.100) 

and  the  error  is  then  [x  —  x].  The  conditional  estimator  is  endowed  with  the  following 
properties: 
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1.  The  conditional  state  estimator  x  is  an  unbiased  estimator. 


2.  The  error,  [x  —  x] ,  is  orthogonal  to 

3.  The  error,  [x  —  x] ,  is  orthogonal  to 

Proof  of  Theorem  65  The  conditional 

state  and  the  statistic  [170] 9 : 

£u{XO(JOZ)}  = 

where  x  is  the  conditional  expectation 

x  =  £x(x|; 


the  measurement  data  z. 

the  conditional  estimator  x. 

estimator  can  be  found  by  correlating  the 


Ez{£x[xo(g  oz)|z]} 

(3.101) 

^Z{^x(x|z)  o(go  z)} 

(3.102) 

Ez{xo(goz)} 

(3.103) 

)  =  E(x\z) 

(3.104) 

which  is  a  function  of  the  data  z  and  it  may  or  may  not  be  a  function  of  any  unknown 
parameters.  Equation  (3.103)  can  be  rewritten  as 

EXy  z{x  o(joz)}  -  Ez{x  o  (g  O  z)}  =  0  (3.105) 

and  since  Ez{x  o  (g  o  z)}  is  also  Ex  z{x  o  (g  o  z)}  then  we  get 

E{[x  —  x]  o  (g  o  z)}  =  0  (3.106) 

Thus,  we  see  that  the  error,  x  —  x,  is  orthogonal  to  the  statistic  g  oz  per  Definition 
49  on  page  3-31,  since  the  cross-correlation  is  zero,  i.e.,  E([x  —  x],  [g  o  z])  =  0. 

9This  proof  is  inspired  by  the  finite-dimensional  case  in  Scharf  [170]. 
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To  prove  property  1,  we  let  the  state  estimator10  in  Equation  (3.106)  be  a 
constant  ones  vector,  i.e.,  g  o  z  =  1,  we  obtain 

E{[x  —  x]  o  1}  =  E[x  —  x]  o  1  =  0  (3.107) 

where  0  is  the  zero  operator.  This  implies  that  E[x  —  x]  =0,  where  0  is  now  the 
zero  vector.  Thus  E(x)  =  E(x),  hence  the  conditional  estimator  is  unbiased. 

Next,  we  show  property  2.  When  the  statistic  is  just  the  data,  i.e.,  g  o  z  =  z, 
then  we  see  that  the  error  is  orthogonal  to  the  data 

E{[x  —  x]  o  z}  =  0  (3.108) 

Finally,  for  property  3,  since  x  is  a  function  of  only  the  data  (a  prerequisite  for 
being  a  statistic),  we  let  g  o  z  =  x  to  obtain 

E{[x  —  x]  o  x}  =  0  (3.109) 

Thus  the  error  is  orthogonal  to  the  conditional  estimator.  ■ 

Definition  66  (Minimum  Variance  Estimator)  For  state  x  e  X  =  L2(h2,P;X) 
and  observation  z  e  Z  =  L2(Q,P;Z),  the  estimator  x  =  gQ  o  z,  given  measurement 
z,  is  called  the  minimum  variance  estimator  if  there  exists  an  optimal  Baire  function 
gQ  such  that  [33] 

1 1 x  —  gQ  o  z||  <  1 1 x  —  g  o  z||  (3.110) 

10Note  that  g  o  z  is  not  a  statistic  in  this  case  since  a  ones  vector  is  independent  of  the  data. 
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holds  for  every  Baire  function  g.  Furthermore,  the  minimum  variance  estimator  of 
x  based  on  measurement  z  is  given  by 11 

x  =  £(x|z)  (3.111) 

Remark  Since  L2  is  a  Hilbert  space,  the  Projection  Theorem  (given  in  Theorems 
35  and  36  on  page  3-21)  applies.  Theorem  35  tells  us  that  if  gQ  exists,  then  it  is 
unique;  hence  the  estimator  x  is  unique.  Additionally,  the  error  vector  is  orthogonal 
to  every  vector  in  the  measurement  space,  i.e.,  (x  —  (g0  o  z),  g  o  z)  =  0  for  every  Baire 
function  g  and  every  measurement  z  6  Z.  The  projection  theorem  in  Theorem  36 
says  that  a  unique  estimator  exists  for  every  measurement  zGZ. 

Definition  67  (Linear  Estimator)  Let  x  be  an  estimator  of  random  vector  x  and 
z  be  a  measurement  used  as  an  input  to  the  estimator,  x  =  g  o  z  to  produce  an 
estimate,  x(cu)  =  g(z(u)).  The  estimator,  x  is  said  to  be  linear  whenever  z  = 
aq Zi  +  a2z2,  for  scalars  aq,  a2,  yields 

g{  z(w))  =  +  a2g(z2(u))  (3.112) 


and  nonlinear  otherwise. 

The  Bayesian  estimation  technique  rests  on  acquiring  (either  analytically  or  ex¬ 
perimentally)  the  posteriori  PDF,  so  that  we  can  calculate  the  conditional  moments 
of  the  state  given  the  measurements.  The  next  step  involves  picking  the  optimality 
criterion  that  will  be  used  to  produce  the  optimal  estimator  [31,  170,  100].  For  our 
research,  we  chose  to  minimize  the  quadratic  cost  function,  (7(e)  =  j|e||2,  where 
e  =  x  —  x  is  an  estimation  error  for  a  cost  function  that  places  a  high  cost  on  large 

11  Technically  speaking,  the  conditional  mean  estimator  proposed  herein  is  actually  the  minimum 
variance  estimator  as  the  result  of  a  theorem;  see  for  example  Catlin  [33]  for  a  proof  that  this  is 
indeed  the  definition  for  an  estimator  that  achieves  the  minimum  variance. 
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errors  and  a  small  cost  on  small  errors,  to  determine  the  optimal  estimator.  Thus, 
we  shall  find  the  estimator  that  minimizes  the  mean-squared  error  (MSE). 

Definition  68  (Minimum  mean-squared  error  estimator)  Let  (X,(.,-)x)  be  a 
separable  Hilbert  space,  x  e  X  =  L2(fl,  P;  X)  be  the  random  state  vector,  (Z,  (•,  -)z)  6e 
a  separable  Hilbert  space,  and  z  e  Z  =  L2(f2,P;  Z)  fre  the  measurement  vector.  The 
MMSE  estimator  is  the  random  vector  which  minimizes  the  Bayesian  MSE  between 
the  estimator,  xmmse,  and  the  true  state,  x,  as  [100,  170] 

xmmse  =  arg  jinin  [Px,z(||x  -  x||x)]  }  (3.113) 

Lemma  69  The  solution  to  Equation  (3.113)  is  the  conditional  mean,  x  =  £(x|z). 

Remark  Two  important  properties  of  this  estimator  are  (1)  it  is  unbiased,  i.e.,  the 
estimator  error  is  zero-mean,  and  (2)  the  estimator  error  and  the  measurements  are 
uncorrelated,  i.e.,  they  are  orthogonal  [14]. 

Proof  of  Lemma  69  We  begin  with  the  definition  of  the  MSE  and  then  expand 
after  adding  a  “smart”  zero,  [£(x|z)  —  £(x|z)]  to  obtain 


Ml|x  *1  lx) 

=  Exz { 1 1 x  +  [£(x|z)  -  £(x|z)]  -  x||x}  (3.114) 

=  Px,z{(x  —  £(x|z)  +  £(x|z)  —  x,x  —  £(x|z)  +  £(x|z)  —  x)x}  (3.115) 

=  ^Xlz{(x-£(x|z),xW£(x|z))x+  (x-£(x|z),£(x|z)  -  x)x 

+  (x|z)  -  x,x  -  £(x|z))x  +  (£(x|z)  -  x,  £(x|z)  -  x)x}  (3.116) 

Then,  taking  the  expectation  over  each  term  yields 
Ml|x  —  *1  lx) 

=  £x,z{(x-£(x|z),x-£(x|z))x}  +  Px,z{(x-£(x|z),£(x|z)  -x)x} 

+  E^z{(£(x\z)  ~x,x-5(x|z))x}  +  PX)Z{(£(x|z)  -x,£(x|z)  -x)x}  (3.117) 
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By  Theorem  65,  we  note  that  [x  —  £(x|z)]  _L  £(x |z)  and  [x  —  £(x|z)]  _L  x,  thus  the 
cross  terms  are  zero.  Note  that  x  is  a  statistic  since  it  is  a  function  of  only  the  data. 
With  the  cross  terms  gone  and  using  the  induced  norm  notation  for  the  remaining 
terms,  we  get 


£x,z(||x-x||x)  =  Ex,z{\\*  -  £{x\z)\\l}  +  EX'Z{\\£{x\z)  -  x\\l}  (3.118) 

Thus,  by  inspection,  x  =  £(x|z)  minimizes  Equation  (3.118)  since  the  first  term  does 
not  depend  on  x.  ■ 

Therefore,  this  MMSE  technique  can  be  used  to  find  a  linear  MVU  estimator 
x  =  ga  o  z;  when  the  random  vectors  are  jointly  Gaussian,  then  we  can  obtain  the 
optimal  MVU  estimator  without  confining  attention  to  only  the  optimum  from  the 
class  of  linear  estimators.  Since  many  of  the  problems  of  interest  to  us  feature  a 
generalized  linear  (or  affine)  relationship  between  the  states  and  the  observations,  we 
will  eventually  pose  our  estimation  problem  in  terms  of  a  linear  measurement  model 
-  see  Definition  72.  However,  at  this  early  stage  of  the  development,  we  need  only 
assume  that  our  observations  and  states  are  statistically  correlated  with  nonzero 
cross-covariances  so  that  the  observations  contain  information  about  the  states;  this 
is  a  necessary  condition  for  using  the  conditional  expectation  to  find  a  (linear)  MVU 
estimator. 


Definition  70  (Correlated  States  and  Observations  Measurement  Model) 

Let  the  random  vector  z  e  Z  =  L2(f2,P;Z)  be  the  observation  of  a  noise-corrupted 
measurement,  x  e  X  =  L 2(f2,P;X)  be  the  random  state  that  is  to  be  estimated,  and 
Z  and  X  be  separable  Hilbert  spaces.  Furthermore,  we  have  knowledge  of  the  means 


Hz  =  E(z) 
Hz  ~  £(x) 


(3.119) 
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the  nonzero  covariances 12 


S(z)  4  E[{z  -  nz)  o  (z  -  mz)] 
S(x)  =  E[(x  -  fix)  o  {x  -  fix)\ 

and  the  nonzero  cross- covariances 

S(x,z)  =  £[(x-/xx)o( 

S(z,x)  =  £[(z  -  Hz)  o  (x  -  Mx)] 


(3.120) 


(3.121) 


The  boxology  for  this  measurement  model  is  illustrated  in  Figure  3.6. 

Finally,  we  come  to  our  first  result  in  this  work  that  applies  equally  to  problems 
with  possibly  nonzero-mean  random  vectors,  a  L1MVUE  for  CSO  —  this  is  the  first 
step  in  our  development  of  the  ISKF. 


Theorem  71  (LIMVUE  for  CSO)  Given  the  measurement  model  in  Definition 
70  and  assuming  the  inverse  £-1(z)  exists13,  then  the  state  estimator,  denoted  by 
x  =  £(x|z);  the  conditional  expectation  of  the  state,  x,  given  an  observation,  z, 


is  found  by  minimizing  the  MSE  £'XiZ(  x  —  x 

and  is 

given  by  the  general  linear 

form  [33] 

x  =  Kz  +  c 

(3.122) 

where 

K  =  S(x,z)S^1( 

z) 

(3.123) 

12We  have  restricted  ourselves  to  Hilbert  spaces  of  Lebesgue  L2  functions  to  guarantee  the  exis¬ 
tence  of  covariances;  however,  this  does  not  imply  that  their  inverses  exist. 

13Even  if  the  inverse  does  not  exist,  we  can  oftentimes  find  a  suitable  pseudoinverse  [33]  and  still 
employ  this  estimator.  Note  that  this  estimator  will  no  longer  be  “optimal”  in  any  sense,  but  it 
may  be  a  useful  suboptimal  algorithm  nonetheless. 
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(V,F,P) 


Figure  3.6  Boxology  of  a  Measurement  Model.  The  lack  of  symmetry  in  this  figure 
is  due  to  the  process  of  “peeling”  off  information  from  the  linear  measurement  model 
boxology  shown  in  Figure  3.8  on  page  3-57. 

and  the  estimator  bias  term  is 


c  =  /ix  —  K  fj,z  (3.124) 

The  error,  e  =  x  —  x,  is  zero-mean  and  has  covariance 

E(e)  =  E(x)  —  KE(z.x)  (3.125) 

Remarks  (1)  We  have  not  made  any  assumptions  on  the  dimension  of  the  state 
or  observation.  Thus  E(x),  S(z),  and  E(e)  are  covariance  operators,  while  E(x,z) 
and  E(z,x)  are  cross- covariance  transformations  as  described  in  Definition  45,  page 
3-28.  (2)  The  boxology  for  this  estimator  appears  in  Figure  3.7.  A  close  look  tells 
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(V,F,P) 


Figure  3.7  Boxology  of  the  LIMVUE  for  CSO.  The  lack  of  symmetry  in  this  figure 
is  clue  to  the  process  of  “peeling”  off  information  from  the  LIMVUE  boxology  shown 
in  Figure  3.9  on  page  3-63. 

ns  quite  clearly  that  we  (a)  have  not  assumed  a  form  (linear  or  otherwise)  for  the 
relationship  between  the  state  and  the  observations,  but  (b)  have  assumed  that  the 
estimator  is  affine. 

Proof  of  Theorem  71  First,  we  will  find  the  estimator  bias  term,  c.  Next,  we 
will  show  that  the  linear  state  estimator,  x,  given  by  Equation  (3.122),  is  unbiased. 
Then  we  shall  derive  the  optimal  transformation  K.  Finally,  we  will  show  that  the 
error  covariance,  given  in  Equation  (3.125),  is  correct. 
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To  find  c,  we  will  minimize14  3(c)  =  E  (||x  —  x||2)  =  E  [ ||x  —  (Kz  +  c)||2]  with 
respect  to  c.  We  will  use  the  same  technique  as  we  used  to  show  that  the  conditional 
mean  is  the  MMSE  estimator  in  Lemma  69.  First,  let  y  =  x  —  Kz,  thus 

3(c)  =  B[||x-(Kz  +  c)|||]  =£[||y-c||i]  (3.126) 

Now  expand  the  right-hand  side  with  a  wisely  chosen  zero  /iy  —  /iy,  where 

AS  =  E{ y)  =  E(x  -  Kz)  =  Mx  “  K mz  (3.127) 


to  obtain 


3(c)  =  E  ( 1 1 y  +  ny  -  —  c||^)  (3.128) 

=  ^((y-Aty  +  A‘y-C,y-/iy  +  A‘y-c)x)  (3.129) 

where  the  second  line  follows  from  the  definition  of  the  induced  norm  —  on  an  inner 
product  space,  the  norm  squared  is  equal  to  the  inner  product  of  the  quantity  with 
itself.  Continuing  to  expand  we  get 

3(c)  =  E({  y-AS,y^My)x+(y^AS,My-c)x 

+  (as  -  c,  y  -  AS ) x  +  (as  -  c,  AS  -  c)x)  (3. 1.30) 

=  E({ y  -  My,  y  -  AS)x)  +  E((y  -  AS’  AS  “  c)x) 

+  E(( My  -  c,  y  -  My)x)  +  E({n y  -  c ,  My  ~  c)x)  (3.131) 

14Note  that  we  have  minimized  the  MSE  using  the  joint  expectation,  Ex,z(’)- 
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Simplify  Equation  (3.131)  by  noting  that  the  cross  terms  are  equal  to  zero  since 
E( y  —  /iy)  =  0  and  /i  —  c  is  a  nonrandom  quantity,  therefore 

5(c)  =  E((y  -  Vy,y  -  Vy)x)  +  (My  -  c,  My  -  c)x  (3.132) 

=  E  (| |y  —  | lx)  +  1 1 —  c||x  (3.133) 

where  the  first  term,  which  is  independent  of  c,  therefore  (5(c)  is  minimized  when 
1 1 2 

|  |/Lty  —  c|  |x  is  equal  to  zero,  which  occurs  when  /iy  —  c  =  0,  hence  c  =  /iy  =  /ix  —  K/iz. 
Consequently,  we  can  now  write  the  state  estimator,  x  =  Kz  +  c,  as 

x  =  /ix  +  K(z  -  Hz)  (3.134) 

The  estimator  bias  is  determined  by  taking  the  joint  expectation  of  the  error 
e  =  x  —  x  [14].  Thus, 

bias(x)  =  E(x  —  x)  =  E{x  —  [/zx  +  K(z  —  Hz)}}  (3.135) 

Rearranging  yields 

bias(x)  =  E[(x  —  /ix)  —  K(z  —  nz)]  (3.136) 

Finally,  moving  the  expectation  inside  yields 

bias(x)  =  E(x  —  /ix)  —  KE(z  —  hz)  —  0  (3.137) 

Therefore,  the  estimator  is  unbiased  as  claimed. 

Now  we  seek  the  transformation  K  e  BCT{ Z,  X),  as  shown  in  Figure  3.7, 
that  minimizes  the  expected  value  of  the  squared  error  between  our  affine  estimator 
x  =  Hx  +  K(z  —  Hz]i  given  observation  z,  and  the  state  x.  Thus,  the  cost  function 
that  we  want  to  minimize  is  5(K)  =  E{\  \x  —  [hx  +  K(z  —  hz)\  1 1  hence  our  task  is 
to  minimize  (J(K)  with  respect  to  K.  We  begin  by  writing  and  then  expanding  the 
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expression  for  3(K) 


m)  =  E[  ||(x-Mx) -K(z-Mz)||2]  (3.138) 

=  #[((x  -  Mx)  -  K(z  -  /iz),  (x  -  Mx)  -  K(z  -  /xz))]  (3.139) 

=  tr£{[(x  -  nx)  -  K(z  -  /ij]  o  [(x  -  /ix)  -  K(z  -  /iz)]}  (3.140) 

where  the  second  line  follows  from  the  definition  of  the  induced  norm  and  the  third 

line  is  a  powerful  identity  that  relates  an  inner  product  to  the  trace  of  an  outer 
product  (for  any  type  of  expectation  operator)  as  stated  in  Equation  (3.75)  on  page 
3-31.  Note  that  i7{[(x  —  /ix)  —  K(z  —  /iz)]  o  [(x  —  /ix)  —  K(z  —  /iz)]}  is  the  error 
correlation,  3(e),  or  equivalently,  the  error  covariance,  E(e),  since  the  error  is  zero- 
mean.  Continuing, 

3(K)  =  trE{[(x-/ix)o(x-/iJ]  -  [(x-/ix)oK(z-/iz)] 

-  [K(z  -  /iz)  o  (x  -  Mx)]  +  [K(z  —  fxz)  o  K(z  —  nz)]}  (3.141) 

=  tr{E[(x  -  /ix)  o  (x  -  n*)]  ~  E[(x  -  nx)  o  K(z  -  /iz)] 

-  E[ K(z  -  /iz)  o  (x  -  Mx)]  +  E[ K(z  -  /iz)  o  K(z  -  Mz)]}  (3.142) 

=  tr{E[(x  -  /ix)  o  (x  -  fMx)]  -  E[(x  -  fMx)  o  (z  -  /xz)]K* 

-  KE[(z  -  /iz)  o(x-  :Mx)]  +  KE[(z  -  /iz)  o  (z  -  Mz)]K*}  (3.143) 

=  tr[E(x)  -  E(x,  z)  K*  —  K  E(z.  x)  +  K  E(z)  K*]  (3.144) 

where  the  second  line  is  due  to  the  linearity  of  the  expectation  operator  and  the 
third  line  is  due  to  Lemma  43,  page  3-26;  K*  is  the  adjoint  of  transformation  K; 
and  covariance  operator  notation  is  used  in  the  fourth  line. 

If  3  were  a  function  of  a  scalar,  vector,  or  matrix,  then  we  could  use  standard 
calculus  to  minimize  the  function;  however,  in  this  problem,  0  is  function  of  a  trans¬ 
formation  and  hence,  the  rules  of  calculus  [5]  and/or  vector  calculus  [126]  do  not 
apply.  So,  we  will  use  a  calculus  of  variations  method  [208,  60,  122,  8]  to  minimize  a 
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function  with  respect  to  a  transformation.  For  positive  a  G  M  and  transformations 
K.L  G  BCT{ Z,  X),  the  Gateaux  variation,  <53(K;L),  defined  as 


<ft(K;L) 


r  3(K  +  cvL)-a(K) 
inn - 

ct— »0  OL 


(3.145) 


can  be  nsed  to  minimize  (j(K)  with  respect  to  K.  To  find  the  optimal  K,  we  assume 
that  K  is  optimal  when  the  variation  is  zero  for  all  L  —  then  we  solve  for  the  optimal 
K  -  this  is  a  necessary  optimality  condition  [208,  122],  Substituting  K  +  aL  in  for 
K  in  Equation  (3.144)  we  can  write 


3(K  +  aL)  =  tr[S(x)  -  S(x,  z)  (K  +  aL)* 

-  (K  +  ah)  E(z,  x)  +  (K  +  ah)  S(z)  (K  +  aL)*]  (3.146) 

=  tr[E(x)  —  E(x,  z)  K*  —  S(x,  z)  aL* 

-  K  E(z,  x)  -  q;L  E(z,  x)  +  K  S(z)  K* 

+  K  £(z)  aL*  +  aL  £(z)  K*  +  aL  £(z)  aL*]  (3.147) 


Subtracting  Equation  (3.144)  from  Equation  (3.147)  yields 


3(K  +  «L)  -  3(K)  =  tr[K  S(z)  aL*  +  aL  S(z)  K*  +  aL  £(z)  aL* 

—  aL  S(z,  x)  —  S(x,  z)  aL*]  (3.148) 


Divide  by  a  to  yield 


3(K  +  aL)  -3(K) 


a 


=  tr[K  S(z)  L*  +  L  S(z)  K*  +  aL  E(z)  L* 
-LE(z,x)  -  E(x,  z)  L*] 


(3.149) 


and  take  the  limit  as  a  — »  0:  then  Equation  (3.145)  becomes 


<J3(K;  L)  =  tr[K  E(z)  L*  +  L  E(z)  K*  -  L  E(z,  x)  -  E(x,  z)  L* 


(3.150) 
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Factoring  out  L*  and  L  results  in 


£J(K;L)  =  tr{[K  S(z)  —  X(x,  z)]L*  +  L[E(z)  K*  —  E(z,  x)]}  (3.151) 

and  since  trA*  =  trA, 


53(K;  L)  =  2tr{[K  £(z)  -  £(x,  z)]L*}  (3.152) 

Assume  the  optimal  K0  minimizes  3.  Then  a  necessary  optimality  condition  [122] 
yields  the  differential  5(j(K0;  L)  =  0  for  all  L,  and  we  get 

tr{[K0  £(z)  -  E(x,  z)]L*}  =  0  for  all  L  (3.153) 


which  implies  that 


K0£(z)-£(x,z)  =  0 


(3.154) 


Rearranging  and  assuming  that  our  measurement  covariance,  E(z),  is  invertible15 


K,J  =  £(x,z)£-1(z) 


(3.155) 


Hence  the  state  estimator,  x  =  /ix  +  K0(z  —  from  Equation  (3.134),  becomes 


x  =  /ix  +  £(x.z)£  \z)  (z-  /i2)  (3.156) 


15Per  Definition  45,  the  covariance  operator  is  only  guaranteed  to  be  positive,  which  means  that 
it  may  have  one  (or  more)  eigenvalues  which  are  zero.  To  be  invertible,  an  operator  may  not  have 
any  zero  eigenvalues.  However,  in  a  practical  system,  the  measurements  are  not  perfect  and  hence 
all  of  the  eigenvalues  are  positive,  versus  nonnegative.  Therefore,  it  not  restrictive  to  assume  that 
S(z)  is  invertible.  Additionally,  we  later  show  that,  for  the  case  of  a  linear  measurement  model 
with  additive  noise,  it  is  sufficient  to  assume  that  the  noise  covariance  operator  is  strictly  positive 
to  guarantee  the  invertibility  of  X(z). 
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Since  Equation  (3.144)  is  the  trace  of  the  error  covariance,  we  have 

E(e)  =  E(x)  -  E(x.  z)  K*  —  K  E(z,  x)  +  K  E(z)  K*  (3.157) 

=  E(x)  —  K  E(z,  x)  +  [K  S(z)  —  E(x,  z)]K*  (3.158) 

Using  the  expression  in  Equation  (3.154)  yields 

E(e)  =  S(x)  -KE(z.x)  (3.125) 

Therefore,  all  of  the  parts  of  this  theorem  have  been  proved.  ■ 

Next,  we  employ  a  generalized  linear  measurement  model  with  zero-mean  white 
Gaussian  additive  noise  (WGAN). 

Definition  72  (Generalized  Linear  Measurement  Model)  The  generalized 
linear  measurement  model  is  represented  by  the  algebraic  equation 


z  =  H  x  +  v 


(3.159) 


where: 


z  G  Z  =  L2(G,P;Z) 
H  G  P£T(X,  Z) 
x  G  X  =  L2(G,P;X) 
v  g¥  =  L2(G,P;V) 


measurement  vector 

measurement  distributor  transformation 
state  vector 

measurement- corruption  noise  vector 


arid  H  is  known  and  Z,  X,  and  V  are  separable  Hilbert  spaces.  Additionally,  the 
measurement  is  corrupted  by  zero-mean  white  Gaussian  additive  noise,  v,  with  a 
known  covariance  operator 


S(v)  =  E{y  o  v)  =  R  G  BCO(Y)  (3.160) 
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(V,F,P) 


Figure  3.8  Boxology  of  a  Generalized  Linear  Measurement  Model. 

which  is  symmetric  and  nuclear  since  it  is  a  covariance  operator  ( see  Definition 
45 )  and  strictly  positive  by  the  nonrestrictive  reasoning  given  in  the  second  remark 
following  Definition  62.  The  mean  of  the  state  is  /ix  and  the  covariance  operator  for 
the  random  state  vector  given  by 

E(x)  =  E[{x  -  px)  o  (x  -  /ix)]  =  P  e  BCO(X)  (3.161) 


is  symmetric,  positive,  and  nuclear  per  Definition  45.  Furthermore,  we  note  that  the 
state  and  measurement- corruption  noise  are  independent15 . 

The  boxology  for  this  linear  measurement  model  is  illustrated  in  Figure  3.8. 

16The  correctness  of  this  statement  will  be  demonstrated  after  we  have  completed  the  ISKF  proof. 
It  will  be  seen  that  the  state  and  the  measurement-corruption  noise  are  independent  because  the 
measurement-corruption  noise  is  mutually  independent  of  the  dynamics  driving  noise  and  the  initial 
state. 
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The  following  lemma  establishes  the  corresponding  result  for  infinite¬ 
dimensional  systems  on  Hilbert  spaces,  that  independent  random  vectors  defined  on 
finite-dimensional  Hilbert  spaces  are  also  uncorrelated,  hence  their  cross-covariances 
are  zero. 

Lemma  73  Let  X  and  V  be  separable  Hilbert  spaces  and  x  e  £(0,  X)  arid  v  e 
5(h2,V)  be  X  andV -valued  random  vectors,  respectively.  If  x  and  v  are  independent, 
then  they  are  also  uncorrelated  and  thus  the  cross-covariance  transformations  are 
zero,  i.e.,  S(x,  v)  =  0  and  £(v,  x)  =  0. 

Proof  of  Lemma  73  From  Definition  45,  the  cross- covariance  of  random  vectors  x 
and  v  is  given  by 

£(x,v)  =  £[(x  -  fjix)  0  (v  -  fjiv)]  (3.162) 

=  E[(x  o  v)  -  (/ix  o  v)  -  (x  o  nv)  +  (/ix  o  /ij]  (3.163) 

=  E(x  o  v)  -  E(/ix  o  v)  -  E(x  o  /itv)  +  E(nx  o  nv)  (3.164) 

We  shall  show,  in  turn,  that  all  four  terms  of  Equation  (3.164)  are  equal  to  /ix  o  /iu. 

The  first  term,  E(x  ov),  is  the  correlation  transformation  of  random  vector  x  with 
random  noise  vector  v.  For  all  rj  e  V  we  have 

[E(x  o  v)]t7  =  E[(x  o  m)tj }  =  E[x(m,  r])]  (3.165) 

where  the  last  equality  follows  from  the  definition  of  the  outer  product.  Since  x  and 
v  are  independent,  then  x  and  a  linear  function  of  v,  namely  the  functional  (v,  77), 
are  independent,  thus 

[E(x  o  v)]t7  =  E(x)E((y,  77))  =  fix(E(y),  77)  =  (/ix  0  /1J77  (3.166) 

for  all  77  G  V.  Therefore,  we  get  E{x  o  v)  =  /ix  0 
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The  second  and  third  terms  on  the  right-hand  side  of  Equation  (3.164)  follow 
from  similar  arguments  and,  thus,  only  second  term  will  be  explicitly  shown.  For 
every  77  G  V, 

[#Ox  0  v)]*7  =  E[(fix  o  v)77]  =  E\nx(v,  rj )]  (3.167) 

Then,  since  px  is  not  random,  we  obtain 

[E{nx  0  v)]»7  =  v))  =  Mx(^(v),  »7)  =  (Mx  0  M*)»7  (3.168) 

Thus,  E(nx  o  v)  =  /j,x  o  /j,v  and  E(x  o  fxv)  =  /ix  o  pu. 

Clearly,  E(jix  o  /uv)  =  /ix  o  p,v.  since  the  means  are  not  random.  Therefore,  all 
four  terms  are  /ix  0  pv  and  the  lemma  holds.  An  analogous  set  of  steps  will  lead  us 
to  the  conclusion  that  E(v,  x)  =  0.  ■ 

In  Theorem  75,  we  require  that  the  measurement  covariance  operator,  E(z)  = 
HPH*  +  R,  be  invertible.  Hence  we  shall  attend  to  it  now  in  the  following  lemma. 

Lemma  74  (Measurement  Covariance  Operator)  Let  H,  P,  and  R  be  as  de¬ 
scribed  in  Definition  12,  then  the  inverse  of  X(z)  =  HPH*  +  R  exists. 

Proof  of  Lemma  74  First,  we  show  that  E(z)  is  HPH*  +  R.  We  begin  with  the 
definition  of  the  covariance  operator  and  then  substitute  z  =  Hx  +  v  and 


pz  =  E{  z)  =  E{  Hx  +  v)  =  H£(x)  +  E{v)  =  H  /ix  (3.169) 

into  E(z)  and  then  regroup  terms  to  get 

£(z)  4  E[(z  -  fij  o  (z  -  fi,)]  (3.170) 

=  fiKHx  +  v-H/iXHx  +  v-HfiJI  (3.171) 

=  B{[H(x-X  +  '']»[H(x-Mx)+v]}  (3-172) 
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Now  employ  the  distributive  property  of  the  outer  product  (shown  in  Lemma  16, 
page  3-11)  to  obtain 

£(z)  =  ^{[H(x-Mx)oH(x-Mx)]  +  [H(x-Mx)ov] 

+  [vo  H(x  -  /L4X)]  +  (v  o  v)}  (3.173) 

Moving  the  expectation  in,  then  using  Lemma  43  on  page  3-26  to  factor  the  nonran¬ 
dom  operators  out  of  the  expectation  yields 


E[  H(x 

— 

aO 

0  H(x  -  p,x)]  +  E[H(x  -  fix)  o  v] 

+  E[m 

o ; 

H(x 

-  /be)]  +  ^(v0v) 

(3.174) 

H£[(x 

— 

Mx) 

0  (x  —  Atx)]H*  +  H£[(x  —  /i.x)  0  v] 

+  £[v 

0 1 

(x- 

/ix)]H*  +  E(v0v) 

(3.175) 

HS(x) 

H 

*  +  : 

HS(x,v)  +  S(v,x)H*  +  S(v) 

(3.176) 

and  the  third  line  results  from  the  definition  of  the  covariance  operator  and  the 
distributive  property  of  the  outer  product. 

Therefore,  we  can  write  Equation  (3.176)  as 

£(z)  =  H  E(x)  H*  +  E(v)  =  HPH*  +  R  (3.177) 

since  both  cross-covariance  terms  are  zero  per  Lemma  73. 

Next  we  show  that  HPH*  +  R  has  an  inverse.  In  general,  we  know  that  a 
covariance  operator  is  positive.  Thus,  a  sufficient  condition  for  our  lemma  is  that 
either  one  of  the  terms  must  be  strictly  positive.  By  our  assumed  specifications 
on  the  measurement-corruption  noise  covariance,  R  is  strictly  positive  (or  positive 
definite  if  R  is  the  matrix  representation  of  an  operator).  Let’s  begin  our  proof  by 
contradiction  by  stating  that  HPH*  +  R  is  nonpositive,  i.e.,  for  every  (  £  Z  the 
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following  holds 


((HPH*  +  R)£,  C>2  <  0  (3.178) 

Using  additivity  property  of  the  inner  product  we  get 

(HPH*C,C)z  +  (RC>Oz<0  (3.1.79) 

We  know  that  the  second  term,  (RC,0^,  on  the  left-hand  side  of  Equation  (3.179)  is 
greater  than  zero,  hence  the  first  term  must  be  negative  (a  necessary  condition)  for 
the  equation  to  hold  true.  For  H  e  BCT (X,  Z),  the  adjoint  of  H  is  H*  e  BCT (Z,  X) 
and  thus 

<HPH*C,  C>z  =  (PH*(,  H*C)x  (3.180) 

where  H*£  =  £  and  £  G  X,  so  Equation  (3.180)  becomes 

<HPH*<,  <>!  =  <?«,  Ox  (3.181) 

We  know  that  the  state  covariance  P.  defined  in  Equation  (3.161)  is  positive,  i.e., 
for  every  £  G  X, 

<P£,£>x>0  (3.1.82) 

Thus,  we  have  a  contradiction  and  hence  HPH*  +  R  is  strictly  positive  and  therefore 
invertible.  ■ 

Now  we  are  ready  to  solve  a  more  specific  problem  using  Theorem  71  in  con¬ 
junction  with  the  new  generalized  linear  measurement  model  given  in  Definition 
72.  With  this  new  measurement  model,  the  LIMVUE  for  CSO  now  becomes  the 
LIMVUE1'  for  the  generalized  linear  measurement  model. 


17For  finite-dimensional  systems,  the  linear  MVU  estimator  (LMVUE)  is  also  the  best  linear 
unbiased  estimator  (BLUE),  in  the  MMSE  sense,  for  the  class  of  linear  and  unbiased  estimators. 
The  LMVUE  (or  BLUE)  with  jointly  Gaussian  random  vectors  is  the  overall  best  (optimal)  MVU 
estimator  [100,  14];  hence  there  are  no  nonlinear  estimators  that  are  better,  in  the  MMSE  sense. 
We  expect  that  this  LIMVUE  is  the  overall  best  infinite-dimensional  MVU  estimator  (IMVUE). 


3-61 


Theorem  75  (LIMVUE)  We  employ  the  generalized  linear  measurement  model 
given  in  Definition  12  to  describe  the  random  vectors  representing  the  measurement, 
state,  and  measurement- corruption  noise.  The  LIMVUE  is 

x  =  nx  T  PH*[HPH*  +  R]^(z  -  aO  (3.1.83) 

where  /iz  =  H/ix  and  the  error  covariance  operator  is 

£(e)  =  P  —  PH*[HPH*  +  R]_1HP  (3.184) 

Since  the  LIMVUE  is  simply  the  LIMVLIE  for  CSO  with  the  generalized  linear 
measurement  model,  the  boxology  for  the  LIMVLIE  (illustrated  in  Figure  3.9)  is  a 
combination  of  the  LIMVLIE  for  CSO  boxology,  see  Figure  3.7  on  page  3-50,  and 
the  boxology  of  the  generalized  linear  measurement  model  shown  in  Figure  3.8,  page 
3-57. 

Proof  of  Theorem  75  Per  Theorem  71,  the  estimator  is  indeed  unbiased.  We  shall 
use  Theorem  71  two  more  times  to  find  the  estimator  and  the  error  covariance  for 
this  theorem. 

The  LIMVLIE  for  CSO,  x  =  /ix  +  E(x,  z)  E_1(z)  (z  —  /iz),  as  given  in  Equation 
(3.156),  is  our  starting  point.  We  will  Erst  determine  a  new  expression  for  the  cross¬ 
covariance  E(x.  z)  by  substituting  z  =  Hx  +  v  and 

/iz  =  E{  z)  =  E{  Hx  +  v)  =  H  E(x)  +  E(v)  =  H/ix  (3.185) 

into  the  definition  of  £(x.  z)  to  get 

E(x,z)  =  E[(x  -  fix)  o  (z  -  nz)}  (3.186) 

=  ^[(x  -  Mx)  o  (Hx  +  v  -  H/aJ]  (3.187) 
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(V,F,P) 


• 

• 

x  :  <B(Z) 

• 

v  =  v(oj) 

z  =  z(u) 

x  =  x(cn) 

Figure  3.9  Boxology  of  the  Linear  Infinite-Dimensional  Minimum  Variance  Unbi¬ 
ased  Estimator. 

Applying  the  distributive  property  in  Lemma  16  (found  on  page  3-11)  results  in 

E(x,z)  =L{[(x-/ix)o(Hx-H/ix)]  +  [(x-/ix)ov]}  (3.188) 

Then  it  follows  from  the  linearity  of  the  expectation  operator  and,  in  the  second  line, 
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from  the  definition  of  the  cross-covariance 


S(x,  z)  =  E[(x  -  nx)  o  (Hx  -  H/ix)]  +  E[(x  -  fj,x)  o  v]  (3.189) 

=  E[(x  -  /xx)  o  H(x  -  /xx)]  +  S(x,  v)  (3.190) 

The  cross-covariance  E(x.v)  is  zero  since  x  and  v  are  independent  (and  thus 
uncorrelated) 18  hence  Equation  (3.190)  becomes 


£[(X-Mx)^H(X-Mx)] 

(3.191) 

£[(x  -  Mx)  o  (x  -  mx)]H* 

(3.192) 

PH* 

(3.193) 

where  the  second  line  follows  from  Lemma  43  given  on  page  3-26.  Substituting  the 
expressions  in  Equations  (3.177)  and  (3.193)  into  Equation  (3.156)  yields 

x  =  /ix  +  PH*[HPH*  +  R]_1(z  -  H/ix)  (3.183) 

Next  we’ll  show  the  error  covariance  operator  given  in  Equation  (3.184).  We’ve 
already  determined  X(x,z)  and  S(z)  per  the  assumptions  for  this  theorem.  Now, 
substituting  z  =  Hx+v  and  //z  =  H/ix  into  the  definition  for  cross-covariance  Sfz.  xj 
and  then  rearranging  as  necessary  yields 


£(z,x)  =  E[(z  -  Hz)  o  (x  -  Mx)]  (3.194) 

=  L[(Hx  +  w-H/ix)o(x-/iJ]  (3.195) 

=  E{[  H(x-/ix) +v]o(x-/ix)}  (3.196) 

=  E{[ H(x  -  nx)  o(x-  mx)]  +  [vo  (x  -  mx)]}  (3-197) 

18See  Lemma  73,  page  3-58. 
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Then  applying  the  expectation  operator  to  both  terms  gives 


£[H(x  -  nx)  o  (x  -  mx)]  +  E[m  o  (x  -  Mx)] 

(3.198) 

H E[(x  -  fMx)  o  (x  -  fMx)]  +  S(v,  x) 

(3.199) 

HP 

(3.200) 

where  we  note  that  E(v,  x)  =  0  since  v  and  x  are  independent. 

Finally,  substituting  the  covariances  found  in  Equations  (3.161),  (3.193), 
(3.177),  and  (3.200)  into 

E(e)  =  E(x)  —  K  E(z,  x)  =  S(x)  -  E(x,  z)  E_1(z)  E(z,  x)  (3.125) 

yields 

£(e)  =  P -PH*[HPH* +  R]  XHP  (3.184) 

Q.E.D.  ■ 

The  operation  of  a  Kalman  filter  is  a  natural  two-step  recursive  process,  con¬ 
sisting  of  a  state  update  with  the  latest  measurement  and  a  state  prediction  based 
on  the  dynamics  model.  In  the  final  theorem  of  this  section,  we  present  the  L1MVUE 
for  a  stochastic  state  process  using  a  stochastic  measurement  process.  We  will  ac¬ 
complish  this  by  generalizing  the  LIMVUE,  given  in  Theorem  75,  for  a  stochastic 
measurement  process  which  is  a  generalization  of  Definition  72. 

Definition  76  (Generalized  Linear  Stochastic  Measurement  Model)  Let  z, 

x,  and  v  be  discrete-time  stochastic  processes  which  map  the  product  space  T  x 
into  their  respective  realization  spaces  Z,  X,  and  V,  where  T  C  M+  and  is  the 
sample  space,  a  nonempty  set  associated  with  a  complete  probability  space  (D,JF,  P). 
At  each  time  ti  G  T,  z (L)  =  z (U,  •)  G  Z,  x(t*)  =  x(fj,  •)  G  X,  and  \i(L)  =  v(L,  •)  G  V 
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are  random  vectors.  The  measurement  process  model  is  defined  by 

z (tj)  =  H (ti)  x(ti )  +  v(ti)  (3.201) 

where: 

z (ti)  G  Z  =  L2(f2,  P;  Z)  ...  measurement  vector 

H(£j)  G  £>£T(X,  Z)  . . .  measurement  distributor  transformation 

x(ti )  G  X  =  L2(f2,  P;  X)  ...  state  vector 

v(ti)  G  V  =  L2(f2,P;  V)  . . .  measurement- corruption  noise  vector 

Additionally,  H(t*)  is  known  and  the  white  measurement- corruption  noise  and  state 
covariances  are  defined  as 

S[v(tj), y(tj)]  =  E[v(ti)  o\z(tj)]  =  R (tf)  %  (3.202) 

S [x(ti),x(ti)]  =  E{[x(ti)  -  nx(U )]  o  [x(tj)  -  Atx(tj)]}  =  P (U)  (3.203) 

Since  v(t*)  and  x(tj)  are  independent  for  all  times  ti  and  tj,  and  P[v(t*)]  =  0, 
the  cross- correlations  mid  cross-covariances  are  zero.  Per  Definition  45,  the  state 
covariance  operator.  P  (tf),  is  symmetric,  positive,  and  nuclear  for  all  time  ti,  whereas 
the  measurement- corruption  noise  covariance  operator,  R (ti),  is  symmetric,  strictly 
positive,  and  nuclear  for  all  time  ti . 

The  boxology  for  this  generalized  stochastic  measurement  model  is  given  in  Figure 
3.10. 

While  the  measurement  history,  defined  in  Equation  (2.18)  on  page  2-14, 
“stored”  the  elements  in  a  growing  vector,  the  following  definition  uses  a  set  be¬ 
cause  the  measurement  vector  may  be  infinite- dimensional,  hence  the  measurement 
history  may  be  infinite-dimensional. 

Definition  77  (Measurement  History)  Let  z(-,-)  be  a  discrete-time  stochastic 
measurement  process  in  accordance  with  the  generalized  linear  stochastic  measure- 
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(V,F,P) 


Figure  3.10  Boxology  of  a  Generalized  Linear  Stochastic  Measurement  Model. 

merit  model  of  Definition  16.  The  stochastic  measurement  history  and  measurement 
history  sample,  through  time  ti  are  defined  as 


z  (ti) 

=  {z(/i).z(/2)v,  ...z(/,)} 

(3.204) 

Z, 

=  {zl;  z2i  ■  ■  •  -i  zi} 

(3.205) 

respectively,  where  z,  is  a  convenient  notation  for  z  (tf),  a  specific  realization  of  the 
random  vector  z(f*) .  Additionally,  since  the  sets  “grow”  with  each  new  measurement, 
they  are  related  as 


Z(ti)  C  Z(t2)  C  •  •  •  C  Z (ti)  C  Z 


(3.206) 


Zi  C  Z2  C  •  •  •  C  Zi  C  z 


(3.207) 
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Now  we  can  state  (in  a  theorem)  the  LIMVUE  for  a  stochastic  measurement 
process  —  an  estimator  that  employs  a  sequence  of  measurements  to  improve  the 
existing  estimate.  By  this,  we  mean  an  estimator  x(tj+1)  that  updates  x(fj)  using  the 
new  measurement  z(ti+l)  —  a  recursive  estimator  for  a  stochastic  process. 

Theorem  78  (LIMVUE  for  Stochastic  Processes)  Let  the  state,  x,  measure¬ 
ment,  z.  and  measurement- corruption  noise,  v.  be  as  described  in  Definition  76  and 
the  measurement  history  in  Definition  71.  Let  z (tj)  for  j  —  1,  2, . . . ,  i  be  the  random 
measurement  vectors  generating  a  subspace  Zj  of  Z.  Then  x(tj)  =  £[x(fj)|Z(fj)]  is 
the  conditional  state  estimator,  an  orthogonal  projection  on  closed  subspace  Zj  of  Z. 
Note  that  expectations  seen  in  Theorems  71  and  75  are  all  now  replaced  with  con¬ 
ditional  expectations,  conditioned  on  the  previous  measurement  history.  Finally,  we 
denote  the  projection  of  z(ti+i)  onto  subspace  Zj  by  z.(t~+1),  where  z(fj+1)  generates 
the  subspace  Zj+i  and  the  superscript  ”  indicates  that  the  estimate  is  based  on  the 
“old”  information  up  through  time  L. 

Therefore,  the  LIMVUE  for  a  stochastic  measurement  process  is  the  conditional 
state  estimator 

x(ti)  =  x(tf)  +  K(fj)[z(fj)  -  z(t~)]  (3.208) 

where  x(f“)  is  the  conditional  state  estimator  based  on  the  “old”  information  up 
through  timeL,  that  is,  x(f“)  =  £[x(fj)|Z(fj_i)],  and  the  Kalma7i  gain  transformation 
takes  the  form 

K(fj)  =  P(ti)H*(fj)A-1(fj)  (3.209) 

and  the  filter- computed  residual  covariance  operator  is 

A  (tj)  =  H(fj)P(fr)IT(tj)  +  R  (U)  (3.210) 
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(V,F,P) 


• 

• 

x(ti )  :  Q3(Z)  ->  X 

• 

v(ti)  =  y{U,u) 

Z i  z(ti}  Cd) 

x(ti)  =  x(ti,u>) 

Figure  3.11  Boxology  of  the  Stochastic  LIMVUE. 

Additionally,  the  corresponding  conditional  error  covariance  operator,  defined  by, 


P(U)  =  v{[x(ti)-k(ti)]\z(ti)  =  zi} 

=  E{[x(ti)  -  x(ti)]  o  [x(ti)  -  x(U)}\Z(ti)  =  Z i} 


(3.211) 


is  given  by 


P(i,)  =  P(i:)-KWHWP(i-) 


(3.212) 


The  boxology  for  this  estimator  is  given  in  Figure  3.11. 


3-69 


Remark  Thus  far,  we  have  only  considered  the  measurement  side  in  preparation 
of  constructing  the  ISKF,  without  due  regard  to  the  dynamics  of  the  system  (or 
process);  this  will  be  addressed  in  the  following  section  when  we  introduce  the  dy¬ 
namics  model  for  the  system.  After  the  dynamics  model  is  added,  Equation  (3.208) 
will  be  amended  to  reflect  that  we  are  now  updating  the  “propagated”  state  esti¬ 
mator  versus  the  previous  state  estimator,  x(ti),  based  on  the  measurement  at  time 
ti.  The  “propagated”  state  estimator  is  due  to  the  dynamics  model  “propagating” 
the  state  estimator  from  time  t\  to  time  £2 ;  we  denote  this  new  estimator  by  x(f^ ), 
where  the  superscript  minus  sign  indicates  the  time  instant  just  prior  to  measurement 
incorporation. 

Proof  of  Theorem  78  If  we  make  the  following  substitutions: 


z  =  z  (ti) 

(3.213) 

Vz  =  Z(t~) 

(3.214) 

X  =  x(ti) 

(3.215) 

(3.216) 

X  =  x(ti) 

(3.217) 

then  we  have  restated  Theorem  71,  and  thus  with  the  aid  of  Lemma  74,  Equation 
(3.208)  follows  and  therefore  Theorem  78  holds.  Note  that  we  have  nsed  means 
conditioned  on  the  previous  measurement  history,  z.(t~)  and  x(t“),  in  the  place  of 
unconditional  means,  /iz  and  /ix,  in  our  substitution  scheme,  respectively.  ■ 

Another  way  of  verifying  Theorem  78  begins  by  noting  that  the  conditional 
state  estimator  x(H)  is  the  best  estimate  of  x(H)  given  subspace  Z1;  where  TL\  was 
generated  by  measurement  z(ti).  The  next  measurement  z (t2),  along  with  z(ti), 
generates  subspace  Z2,  while  z (t^)  denotes  the  projection  of  z(t2)  onto  snbspace  TL\. 
Hence,  z(t^ )  is  the  best  estimate  of  z(t2)  given  subspace  1L\.  Let  r(t2)  =  z (t2)  —  z(t2) 
be  the  residual  (or  difference  between  the  true  measurement  and  the  best  prediction 
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of  the  measurement)  at  time  f2.  It  then  follows  that  the  projection  of  x(f2)  onto  the 
subspace  of  Z2,  denoted  x(t2),  is  given  by 

x(f2)  =  x(ti)  +  S[x(f2),  r(f2)]  S_1[r(f2)]  r(t2)  (3.218) 

Now  observe  that  the  estimate  of  x(f2),  which  lives  in  subspace  7L\  +  Z2  C  Z, 
can  be  decomposed  as 

1L\  -f-  Z2  =  Zi  ©  Y2  (3.219) 

where  Y2  =  {z  :  z  G  Z2  and  z  0  Zi},  or  in  words,  Y2  is  that  part  of  Z2,  which  is 
not  in  TL\  D  Z2,  and  ©  is  the  direct  sum  operation.  Thus  Y2  and  Z|  are  orthogonal 
subspaces,  i.e.,  Y2  _L  Zx.  Hence,  Equation  (3.218)  can  be  written  as 

x(t2)  =  x(ti)  +  y(t2)  (3.220) 

where 

y(t2)  =  S[x(t2),  r(f2)]  S_1[r(f2)]  r (t2)  e  Y2  (3.221) 

represents  the  new  information  about  x  brought  by  the  second  measurement  and 

x(t i)  G  Zi  (3.222) 

represents  the  best  estimate  from  the  first  (or  previous)  measurement.  Therefore, 
we  see  that  the  projection  onto  a  sum  of  subspaces  is  equal  to  the  sum  of  individual 
projections  when  the  subspaces  are  orthogonal.  Q.E.D.  again. 

3-4  Dynamics  Model 

One  of  the  first  tasks  in  model-based  estimation  is  to  create  a  mathematical 
model  of  the  system  of  interest.  Many  physically  motivated  problems  are  well  mod- 
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eled  using  a  linear  continuous-time  dynamics  model  of  the  stochastic  system;  our 
first  definition  is  an  abstraction  of  that  model. 


Definition  79  (Continuous- Time  Dynamics  Model)  A  continuous-time  and 
space  model  for  the  linear  dynamics  of  a  stochastic  state  process  x  can  be  viewed 
as  a  set  of  random  vectors  (x(t)  :  t  G  T},  where  T  =  [t0,i/]  C  M+,  by  the  stochastic 
differential  equation 


dx(f)  =  [F (t)  x(t)  +  B (t)  u (t)]dt  +  G(t)  db (t) 

x(t0)  =  x0 


(3.223) 


where  the  vectors,  operators,  and  transformations  are  defined  at  time  t  as 

x(t)  G  X  =  L2(h2,  P;  X)  ...  state  vector 

F(t)  G  COifK)  . . .  state  distributor  operator 

B(t)  g£T(U, X)  ...  input  distributor  transformation 

u  it)  G  U  ...  known  input  vector 
G (t)  eBCT( B,  X)  ...  noise  distributor  transformation 

b(f)  G  B  =  L2(h2,  P;  B)  ...  Brownian  motion  noise  vector 

Additionally,  b  is  a  Brownian  motion  process  ( and  thus  an  independent  increment 
process19)  with  constant  diffusion  operator  Q  as  discussed  in  Definition  61.  Further¬ 
more,  the  dynamics  model  must  include  the  pertinent  boundary  conditions  for  the 
specific  problem  at  hand.  For  example,  if  F  =  V,  a  gradient,  then  we  could  associate 
a  Dirichlet  or  Neumann  boundary  condition  with  Equation  (3.223). 

While  many  of  the  problems  of  interest  are  best  modeled  with  a  continuous¬ 
time  description  discussed  in  Definition  79,  we  will  most  likely  need  a  discrete¬ 
time  model  so  that  the  eventual  filtering  algorithm  can  be  implemented  on  a  digital 
computer  as  software.  On  the  other  hand,  if  the  problem  is  posed  in  a  discrete-time 
format,  then  the  following  definition  still  applies,  but  without  the  interpretation 

19The  most  general  form  of  additive  noise  need  only  be  an  “independent  increment”  process, 
hence  we  may  also  develop  a  continuous-time  dynamics  model  which  features  a  generalized  Poisson 
noise  process  [141,  129,  66]. 
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from  the  continuous-time  model,  as  is  already  stated.  One  drawback  to  a  naturally 
discrete-time  dynamics  model  is  that  there  is  no  guarantee  that  the  state  transition 
operator,  as  defined  below,  will  be  invertible;  this  is  precisely  the  same  situation 
as  discussed  in  Section  2.3.1,  regarding  the  state  transition  matrix  for  the  finite- 
dimensional  dynamics  model. 

The  following  model  is  termed  an  equivalent  discrete-time  model20  since  the 
state  at  any  time  t *  is  precisely  the  same  as  the  state  using  the  continuous-time  model 
at  time  t  =  tj.  Note  that  the  set  of  time  instants  in  the  continuous-time  model  is  a 
continuum,  i.e.,  T  =  [t0,tf],  whereas  for  the  discrete-time  model,  it  is  a  discrete  set: 
T  =  {to,ti, . . .  ,tf}.  The  process  for  creating  the  equivalent  discrete-time  model  is  a 
substantially  different  process  than  merely  sampling  the  continuous-time  process,  as 
will  be  seen  in  the  development  following  the  definition. 


Definition  80  (Discrete-Time  Dynamics  Model)  A  discrete-time  and  space 
model  for  the  linear  dynamics  of  a  stochastic  state  process  x  can  be  viewed  as  a 
sequence  of  random  vectors  (x(f)  :  t  G  T},  where  T  =  {t0,ti, . . .  ,tf}  C  M+,  by  the 
stochastic  difference  equation 


*{U+i)  =  U )  x(ij)  +  Bd(ti)  u (ti)  +  Gd (ti)  w d(ti) 


x(t  o)  =  x0 


(3.224) 


where  the  vectors,  operators,  and  transformations  are  defined  at  time  U  as 

x(tj)  G  X  =  Lr(f2,  P;  X)  ...  state  vector 

<E»(fj+i,tj)  G  BCOifK)  . . .  state  transition  operator  from  time  ti  to  tl+\ 
Bd(tj)  €  jCT(U,  X)  . . .  discrete-time  input  distributor  transformation 
u  (ti)  G  U  ...  known  control  input  vector 
Gd  (ti)  G  P£T(W,  X)  . . .  discrete-time  noise  distributor  transformation 
wd (tf)  G  W  =  L2(f2,P;  W)  . . .  zero-mean  white  Gaussian  noise  vector 

20See  Maybeck  [129]  for  the  analogous  finite-dimensional  case. 
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where  the  zero-mean  white  Gaussian  dynamics  noise  process,  wd(-,  •),  has  covariance 
kernel 


S[wd(^),  wd(tj)]  =  E[wd(ti )  o  wd (tj)]  =  Q d(ti)  Sij  (3.225) 

where  the  bounded  and  linear  covariance  operator  Qd(ij)  e  BCO( W)  is  symmetric, 
positive,  and  nuclear  for  all  time  ti.  Furthermore,  the  initial  state  condition  x(f0)  is 
not  known  precisely;  it  will  be  modeled  as  a  Gaussian  random  vector,  independent  of 
wd(-,  •)  with  mean  and  covariance  specified  as  follows 

E[x(t0 )]  =  x0  (3.226) 

£[x(t0)]  =  E{[x(t0)  -  x0]  o  [x(t0)  -  x0]}  =  P0  (3.227) 

where  the  initial  error  covariance  operator,  P0  €  BCOifK),  is  symmetric,  positive, 
and  nuclear.  For  time  tl+\  =  t\, . . .  ,tf,  the  conditional  error  covariance  operator  is 
defined  by 

P(<r+i)  =  S{[x(ti+1)  -  x(«i+1)]|Z(i4)  =  Z,} 

(3.228) 

=  E{[x(ti+ 1)  -  x(ti+ 1)]  o  [x(ti+ 1)  -  x(ti+i)]|Z(ti)  =  Zi) 


The  boxology  for  the  discrete-time  dynamics  model  appears  in  Figure  3.12. 
Note  that,  for  illustrative  purposes,  we  have  chosen  to  use  two  boxes  to  represent 
the  Hilbert  space  of  random  vectors,  X,  one  for  time  fj  and  one  for  time  t*+1.  On 
the  other  hand,  we  have  just  one  box  for  the  realization  space,  (X,  23  (X),  Px). 

If  the  discrete-time  dynamics  model  is  based  on  the  continuous-time  model 
detailed  in  Definition  79,  then  <&(ii+i, tf).  B d(U),  u (tf),  Gd(f*),  Q d(U),  and  wd(fj) 
are  based  on  the  following  development  entailing  a  series  of  definitions  and  theorems, 
which  results  from  solving  the  stochastic  differential  equation.  The  solution  (or  mild 
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(V,F,P) 


Figure  3.12  Boxology  of  a  Discrete-Time  Dynamics  Model. 


form)  of  Equation  (3.223)  is  given  by  the  so  called  evolution  system  [38,  66] 

x(t)  =  <T(t,  t0)  x0  +  f  <&(£,  s)  B(s)  u(s)  ds  +  f  &(t,  s)  G(s)  db(s)  (3.229) 
Jto  Jto 


where  is  the  state  transition  or  mild  evolution  operator  associated  with  the 

state  distributor  operator,  F(f)  [38,  160].  The  theory  for  evolution  operators  is 
rather  technical  and  is  not  needed  to  develop  the  theory  for  the  class  of  problems 
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in  this  research21.  Therefore,  the  full  theory  for  (one-  and  two-parameter)  evolution 
operators  will  be  neither  reviewed  nor  pursued  in  this  research. 

While  a  time- varying  state  distributor  operator,  F(t),  generates  a  semigroup 
of  two-parameter  state  transition  operators,  <f>(f,s),  a  time-invariant  state  distribu¬ 
tor  operator,  F.  generates  a  semigroup  of  one- parameter  state  transition  operators, 
<E*(f  —  s)  [38,  160,  39,  48,  115].  The  single  parameter  is  denoted  by  the  “time”  dif¬ 
ference  t  —  s  for  0  <  s  <  f  <  oo,  just  as  it  was  for  the  finite- dimensional  case  we 
reviewed  in  Section  2.3.1,  Equation  (2.8),  on  page  2-11. 

Briefly,  the  plan  for  the  rest  of  this  section  falls  into  two  main  parts.  First,  we 
will  discuss  two  types  of  one-parameter  semigroups  of  BLOs.  Then,  we  will  match 
up  the  terms  in  Equations  (3.223)  and  (3.229)  as  we  determine  Bd,  Gd,  and  wd. 

The  pertinent  theory  for  generating  these  one-parameter  semigroups  is  included 
in  the  following  series  of  definitions  and  theorems.  We  begin  with  the  definition  for 
a  semigroup  of  BLOs. 

Definition  81  (Semigroup  of  BLOs)  Let  X  be  a  Banach  space.  A  one-parameter 
family  of  BLOs,  denoted  by  (<h(f)  :  t  >  0}  for  0  <  t  <  oo,  is  a  semigroup  of  BLOs 
on  X  if  [160]: 

/.  #(0)  =  I,  where  I  is  the  identity  operator  on  X,  and 
2.  <l>(t  +  s)  —  <&(t)<&(s)  for  every  t,  s  >  0 

There  are  several  types  of  one-parameter  semigroups;  we  shall  discuss  just  two: 
the  uniformly  and  the  strongly  continuous  semigroup  of  BLOs.  These  categories  of 
operators  are  due  to  the  nature  of  the  generating  time- invariant  operator  F  discussed 
above.  The  uniformly  continuous  semigroup  of  BLOs  is  included  to  show  where  the 

21Limiting  our  explication  to  one-parameter  semigroups  is  not  as  restrictive  as  it  may  seem  ac¬ 
cording  to  Engel  and  Nagel  [48].  They  (along  with  their  collaborators)  have  studied  population, 
nuclear  transport,  delay  differential,  and  Volterra  equations,  and  both  ordinary  and  partial  dif¬ 
ferential  operators  in  the  form  of  an  abstract  Cauchy  problem  using  the  theory  of  one-parameter 
semigroups. 
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finite-dimensional  theory  and  the  infinite-dimensional  theory  appear  to  agree  in  the 
form  of  their  equations,  while  the  strongly  continuous  semigroup  of  BLOs  is  needed 
for  our  extended  example  discussed  in  the  next  chapter. 

Definition  82  (Uniformly  Continuous  Semigroup  of  BLOs)  A  semigroup  of 
BLOs,  {<&(£)  :  t  >  0},  is  said  to  be  uniformly  continuous  if 

lim||$(t) -III  =  0  (3.230) 

fio 

In  general,  F  does  not  need  to  be  bounded,  as  seen  in  Definition  79.  However, 
when  F  is  bounded,  it  is  the  infinitesimal  generator  for  a  uniformly  continuous 
semigroup  of  operators. 

Theorem  83  (Properties  of  a  Uniformly  Continuous  Semigroup  of  BLOs) 

Let  {<&(£)  :  t  >  0}  be  a  uniformly  continuous  semigroup  of  BLOs.  The  infinitesimal 
generator  F  for  a  uniformly  continuous  semigroup  is  a  BLO.  Then  [160]: 

1.  There  exists  a  constant  to  >  0  such  that  ||<l?(f)||  <  exp  (cut). 

2.  There  exists  a  unique  BLO  F  such  that  <&(£)  =  exp(fF). 

3.  The  operator  F  in  part  ( 2 )  is  the  infinitesimal  generator  of  (<l>(f)  :  t  >  0}. 

4-  t  i— >  <&(£)  is  differentiable  in  norm  and 

=  F <F(t)  =  <F(t)  F  (3.231) 

Proof  of  Theorem  83  See  Pazy  [160]. 

In  our  research,  our  F  is  unbounded.  Thus,  we  employ  the  strongly  continuous 
semigroup  of  BLOs,  which  is  defined  next. 
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Definition  84  (Strongly  Continuous  Semigroup  of  BLOs)  Let  X  be  a  Ba¬ 
nach  space.  A  semigroup,  {<£(£)  :  t  >  0},  of  BLOs  on  X  is  a  strongly  continuous 
semigroup  of  BLOs,  for  0  <  t  <  oo,  if 

lim<f>(f)x  =  x  (3.232) 

no 

for  every  x  m  X.  A  strongly  continuous  semigroup  of  BLOs  on  X  is  called  a  semi¬ 
group  of  class  C0  or  simply  a  C0  semigroup. 

Some  useful  properties  of  this  class  of  semigroup  operators  are  reported  in  the  fol¬ 
lowing  theorem22. 


Theorem  85  (Properties  of  a  Strongly  Continuous  Semigroup  of  BLOs) 

Let  <F(f)  be  a  Co  semigroup  and  let  F  be  the  infinitesimal  generator.  Then  [160]: 


1 .  For  x  G  X 


2.  For  x  G  X 


1  ri+n 

lim  —  /  4>(s)  xds  =  4>(f)  x 

h—>0  h  Jt 

[  $(s)xdsGf)( F) 

Jo 


where  T>(  F)  denotes  the  domain  and 


F  <F(s)  xds^  =  <&(f)  x  —  x 


(3.233) 


(3.234) 


(3.235) 


3.  For  x  G  Z>(F), 


4>(f)  x  —  $(s)  x  =  J  <F(r)  Fxdr  =  j  F  <F(r)  xdr 


(3.236) 


22Curtain  [38,  39]  and  Lax  [115]  also  present  a  similar  collection  of  semigroup  properties  in  the 
infinite-dimensional  linear  systems  framework. 
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4-  For  x  e  V(F),  we  have  $(t)xG  X>(F)  and 


^$(f)x=F$(f)x=$(t)Fx  (3.237) 

dt 

Proof  of  Theorem  85  See  Pazy  [160]. 

Now  that  we’ve  laid  the  groundwork  for  the  one-parameter  semigroup  of  BLOs, 
we  turn  to  finding  the  equivalent  discrete-time  transformations  and  additive  noise 
shown  in  Equation  (3.224).  While  we  have  generally  limited  our  discussion  of  the 
semigroups  to  those  with  a  single  parameter,  the  theory  that  follows  does  not  depend 
on  this  simplification;  hence,  the  remainder  of  the  development  will  employ  the  (less 
restrictive)  two-parameter  notation  for  the  state  transition  operator,  <£(£,  s). 


Theorem  86  (Equivalent  Discrete-Time  Input  Distributor  Transforma¬ 
tion)  Given  the  continuous-time  dynamics  model  in  Definition  79  and  the  desired 
form  of  the  discrete-time  dynamics  model  in  Definition  80,  the  equivalent  discrete¬ 
time  input  distributor  transformation  is  given  by 

Bd(tj)  =  [  t )  B(t )  dr  (3.238) 

Ju 

provided  u(t)  is  a  piece-wise  constant  function,  constant  over  each  sample  period. 

Remarks  (1)  Practically  speaking,  the  restriction  on  control  input  u(f)  merely  re¬ 
flects  that  case  for  adjusting  the  control  input  at  the  end  of  each  propagation  cycle, 
i.e.,  after  a  sample  period  has  ended.  If  this  control  input  is  generated  by  a  digital 
computer,  the  interface  to  the  continuous-time  system  is  assumed  to  be  through  a 
zero-order  hold,  thus  keeping  the  control  value  constant  over  the  ensuing  sample 
period.  (2)  For  many  problems,  such  as  the  heat  equation  example  discussed  in  the 
following  chapter,  the  control  input  may  also  be  a  function  of  the  spatial  dimension. 
There  is  nothing  in  this  definition  that  precludes  us  from  allowing  the  control  input 
to  vary  over  the  spatial  dimension. 
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Proof  of  Theorem  86  If  we  were  to  compute  the  state  using  Equation  (3.229)  at  the 
discrete  time  instants  T  =  {to,ti,  •  •  • ,  £/},  then  we  could  rewrite  it  with  subscripted 
time  arguments  as  follows 


x(*i+i) 


rh+i 

&(ti+1,ti)x(ti)+  I  ${ti+1,s)B(s)u(s)ds 

rU+ 1 


'U 


+ 


$(ti+i,s)G(s)db(s) 


Hi 


(3.239) 


where  to  <  U  <  ti+ 1  <  tf.  Matching  up  the  terms  in  Equations  (3.224)  and  (3.239) 
yields 

rU+ 1 


Bd(tj)u(tj)=  /  &(ti+1,s)B(s)u(s)ds  (3.240) 


If  we  assume  that  the  control  input  is  constant  for  this  particular  time  interval,  i.e., 
u(s)  =  u (ti)  for  ti  <  s  <  ti+ 1,  then  we  may  pull  u(s)  outside  of  the  integral  in 
Equation  (3.240)  to  get 


B d(ti)  u (ti) 


rU+ i 


&(ti+i,s)  B(s)  ds  u (ti 


(3.241) 


Since  Equation  (3.241)  holds  for  all  piece- wise  constant  u(i),  we  have  found  the 
Bd(tj)  operator  in  Equation  (3.238).  ■ 


Definition  87  (Equivalent  Discrete-Time  Noise  Distributor  Transforma¬ 
tion)  Analogous  to  the  finite- dimensional  case  developed  in  [129]  ,  we  define  G d(tj) 
to  be  an  identity  operator  on  W. 


Theorem  88  (Equivalent  Discrete-Time  Noise  Characterization)  For  the 

continuous-time  dynamics  model  in  Definition  79  and  the  desired  form  of  the 
discrete-time  dynamics  model  in  Definition  80,  the  equivalent  discrete-time  noise 
vector,  defined  by 

w d(U)=  I  ,+i$(M)G(s)db(s)  (3.242) 
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is  a  zero-mean  white  Gaussian  with  covariance  kernel  operator,  £[wd(tj),  wd(tfc)]  = 
0,  for  ti  7^  tfc,  and  covariance  operator 


S[wd(^),wd(^)' 


(L+i j  s)  G(s)  Q  G *(s)  3?*(ti+i,  s)  ds 


Qd{U)  (3.243) 


whenever  U  =  tk- 

Proof  of  Theorem  88  Let  the  third  term  on  the  right-hand  side  of  Equation  (3.239) 
be  identified  as  equivalent  discrete-time  noise  vector.  To  include  the  possibility  of 
considering  wd  for  more  than  a  single  interval,  such  as  t,  to  t*+ 1,  we  write  it  as  a 
function  of  two  (possibly  nonconsecutive)  time  instants  ti  <  tj 


w  =  /  $(tj,s)G(s)db(s) 


(3.244) 


The  mean  of  wd(tj,  tj)  is  zero,  i.e.,  [38] 


^[wd(^d3-)]  =  E 


<E»(tj,  s )  G(s)  db(s) 


L  Ju 


=  0 


(3.245) 


Next,  we  write  the  covariance  for  the  case  of  overlapping  intervals 


£[wd(ti,  tj),  w d(4,  ti)]  =  E[wd(U ,  tj)  o  wd(4,  b)]  (3.246) 


Then, 


E[wd(ti,tj)  owd(4,t,)] 

/•min  (q,tp 

=  /  $(min(tj,  s)  G(s)  Q  G*(s)  $*(min(tJ-,  t/),  s)  ds  (3.247) 


where  tj  >  t*,.  The  covariance  for  non-overlapping  intervals,  i.e.,  whenever  either 
tj  <  tk  or  ti  <  ti  is  true,  is  [38] 
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£[wd(ti,tj),  wd(t*,t{)]  =  o 


(3.248) 


Note  that  we  only  assumed  that  U  <  tj  and  ty-  <  ti ,  and  that  in  general,  tj  —  U 
need  not  equal  ti  —  tk.  When  tt  and  tj  are  consecutive  times  such  that  tj  =  ti+ 1  and 
similarly  for  ty.  and  f/,  then  we  need  only  one  time  argument  for  Wj;  thus,  we  obtain 


s[wd(i,),wd(4); 


| ,/;*+1  S(ti+1,  s)  G(s)  Q  G*(s)  S*(ti+1,  s)  ds , 

i°- 


U 


tk 


ti  tk 


(3.249) 


Note  that  when  tj  7^  tk,  that  we  have  nonoverlapping  intervals  since  all  of  our 
intervals  are  disjoint  per  our  construction.  ■ 


3.5  Infinite- Dimensional  Sampled- Data  Kalman  Filter 

The  first  Kalman  filter  was  derived  for  a  discrete-time  environment  with  finite¬ 
dimensional  states  by  Kalman  in  1960  [95].  One  year  later,  Kalman  and  Bucy  com¬ 
bined  efforts  to  pose  the  Kalman-Bucy  filter  to  treat  continuous-time-measurement 
estimation  problems  [96].  Many  physically  motivated  problems  are  set  in  a  (more 
general)  Hilbert  space  that  is  not  necessarily  of  finite  dimension.  In  1967,  Falb  con¬ 
tributed  the  infinite-dimensional  Kalman-Bucy  filter  (IKBF)  [51].  Note  that  when 
we  say  “infinite-dimensional,”  what  we  are  really  saying  is  that  the  states  do  not 
have  to  be  finite  length  vectors:  they  can  be  functions  or  some  other  objects  de¬ 
fined  on  a  Hilbert  space.  The  following  table  shows  that  with  the  addition  of  the 
new  infinite-dimensional  sampled-data  Kalman  filter  (ISKF),  linear  filtering  theory, 
consisting  of  the  four  filters  of  Table  3.1,  forms  a  complete  set  of  optimal  estimation 
tools  that  can  be  applied  in  practice. 

Before  proceeding,  we  shall  explicitly  call  attention  to  the  two  forms  of  the 
conditional  error  covariance  that  we  previously  defined  in  Equation  (3.211)  as  we 
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Discrete-time 

C  ont  inuous-t  ime 

F  inite-dimensional 

Kalman  filter  (1960) 

Kalman-Bucy  filter  (1961) 

Infinite-dimensional 

ISKF  (2007) 

IKBF  (1967) 

Table  3.1  Quartet  of  Kalman  Filters 


updated  the  estimate  with  a  new  observation  and  in  Equation  (3.228)  in  order  to 
propagate  the  estimate  from  time  b  to  time  ti+i. 

Definition  89  (Conditional  error  covariance)  The  propagated  state  estimator 
error  is  defined  by 

e(Q")  =x(ti)  ~x(t~)  (3.250) 

and  the  zero-mean  conditional  error  covariance  operator  for  the  propagated  state 
estimator  error  is 

P  («P  =  B[e(tr)|Z(t,.1)  =  Z(_J  (325 

=  ^{[xfe)  -  *(<r)l » [x((i)  -  x(v)]iz(*i-i)  =  zi-i} 

where  the  realization  of  x(tf)  is  used  in  the  second  line  since  the  measurement  is 
given  and  the  ordered  sets  Z(t^i)  and  Z%_x  represent  the  stochastic  measurement 
history  and  measurement  history  sample,  respectively ,  through  time  as  defined 
in  Equations  (3.204)  and  (3.205). 

Similarly,  the  updated  state  estimator  error  is  defined  by 

e(t+)  =  x(ti)  -  x(t+)  (3.252) 


and  the  zero-mean  conditional  error  covariance  operator  for  the  updated  state  esti¬ 
mator  error  is  given  as 


P  (tf)  =  S[e(t+) \Z(ti)  =  Zj] 

=  E{[*(U)  ~  x(ft)]  o  [x(U)  -  x(t+)]\Z(U)  =  Z i} 


(3.253) 
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where  the  realization  ofx(tf )  is  used  in  the  second  line  since  the  measurement  is  given 
and  the  ordered  sets  Z  (tf)  and  Zj  represent  the  stochastic  measurement  history  and 
measurement  history  sample,  respectively,  through  time  ti  as  defined  in  Equations 
(3.204)  and  (3.205). 

We  pose  the  following  lemma  relating  two  conditional  error  covariances  to  their 
respective  conditional  state  covariances.  These  relationships  are  common  knowledge 
for  finite- dimensional  systems  [129]. 

Lemma  90  Given  Definition  89,  the  following  equivalences  hold 

S )|Z(£j_i)  =  Zj_i]  =  S[x(t*)|Z(tj_i)  =  Zj_i]  (3.254) 

S[e(t+)|Z(L)  =  Zj]  =  S[x(ij)|Z(ij)  =  Z,]  (3.255) 

Proof  of  Lemma  90  Per  the  definition  of  the  error  given  in  Equation  (3.250),  the 
left-hand  side  of  Equation  (3.254)  is 

E[e(^“)|Z(ti_i)  =  Zj_i]  =  E [x(U)  -  x(t~)\Z(ti-i)  =  Zi_i\  (3.256) 

Next,  expand  the  right-hand  side  to  get 
E[e(tr)|Z(ti_1)  =  Zi_1] 

=  E{[x(ti)  -  x(t")]  o  [x(^)  -  x(t“)]|Z(ti_i)  =  Z,:_i}  (3.257) 

=  E{[x(ti)  -  £[x(^)|Z(4_i)]]  o  [x(ii)  -  ^[x(L)|Z(tj_i)]]|Z(L_i)  =  Z,_i}  (3.258) 


where  lines  one  and  two  follow  from  the  definition  of  the  covariance  and  the  defini¬ 
tion  of  the  conditional  state  estimator,  respectively.  Since  the  measurement  history 
through  time  ti- 1  is  known,  the  conditional  state  estimator  £[x(ti)\Z(ti-i)\  is  really 
E[x(ti)\Z(ti-i)  =  Zj_i],  which  is  just  the  conditional  state  estimate  x(t“),  thus  we 
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get 


E[e(tr)|Z(V-1)  =  Zi_1] 

=  E{[x(ti)  -  x(tr)]  o  [x(^)  -  x(t“ )]  |Z(tj_i)  =  Zj_i}  (3.259) 

=  S[x(tj)|Z(ij_i)  =  Zj_i]  (3.260) 

since  x(t“)  =  P[x(tj)|Z(tj_i)  =  Zj_i],  A  similar  line  of  reasoning  holds  for  the 
conditional  error  covariance  following  measurement  update.  ■ 

We  are  now  ready  to  state  in  the  form  of  a  theorem  the  central  result  in  this 
chapter:  the  infinite-dimensional  sampled-data  Kalman  filter  (ISKF). 

Theorem  91  (ISKF)  Given:  the  measurement  model  of  Definition  76  and  the 
equivalent  discrete-time  dynamics  model  comprised  of  Definition  80.  Thus,  we  have 
the  following  stochastic  difference  equations23, 


x(ti+i)  =  $(ti+i,  U)  x(ti)  +  Bd (fi)  u (ti)  +  Gd (U)  wd (fi)  (3.261) 


and 


z  (U)  =  H  (ti)x(tj)  +  v(t») 


(3.262) 


where  the  dynamics  model  is  further  described  by 

x(ti)  G  X  =  L2(12,  P;  X)  ...  state  vector 

&(ti+i,ti)  G  BCOifK)  . . .  state  transition  operator  from  time  U  to  U+ 1 
Bd(tj)  G  £T(U,  X)  . . .  discrete-time  input  distributor  transformation 
u  (tf)  G  U  ...  known  control  input  vector 
G d(ti)  G  BCT( W,  X)  . . .  discrete-time  noise  distributor  transformation 
w d(ij)  G  W  =  L2(12,P;  W)  . . .  zero-mean  white  Gaussian  noise  vector 

and  the  components  of  the  observation  model  are 

23  Gd  (tf)  was  not  assumed  to  be  the  identity  operator  here ,  although  without  loss  of  generality, 
it  can  be. 
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z (ti)  G  Z  =  L2(f2,  P;  Z)  ...  measurement  vector 

H(tj)  G  £>£T(X,  Z)  . . .  measurement  distributor  transformation 

x(ti )  G  X  =  L2(f2,  P;  X)  ...  state  vector 

v(tj)  G  V  =  L2(f2,  P;  V)  ...  measurement- corruption  noise  vector 

Additionally,  we  assume  that  v(tj),  w d(tj),  and  the  initial  state  x(t0)  are  mutually 
independent  for  all  time.  Thus,  x(tj)  and  v(tj)  are  independent  for  all  time. 

The  ISKF  algorithm  consists  of  a  two-step  recursive  process  following  initial 
state  and  conditional  error  covariance  estimates,  which  are  actually  the  mean  and 
covariance  of  the  Gaussian  random  vector  x(t0): 

x(t0)  =  E[x{t0)]  =  x0  (3.263) 

P(*o)  =  Po  (3.264) 

At  time  tl;  the  filter- computed  residual  covariance  and  Kalman  gain  transformation 
are,  respectively, 


A(tj)  =  H(tj)  P(t")  H*(tj)  +  R(tj)  (3.265) 

K(tj)  =  P(t“)  H*(tj)  A_1(tj)  G  BCT(Zj,X)  (3.266) 

When  the  conditional  state  estimator  <?[x(tj)|Z(tj)]  is  evaluated  with  the  current  mea¬ 
surement  z (tf)  =  z i  it  becomes  a  realization  ofx(tf),  which  we  denote  by  x(t+);  and 
thus 

x(t+)  4  E[x(ti)\Z(ti)  —  Zj] 

=  x(t“)  +  K(ti)r(ti) 

where 

r (ti)  =  Zj  -  H(tj)  x(t“ ) 


(3.267) 


(3.268) 
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is  called  the  measurement  residual.  The  corresponding  conditional  error  covariance 
after  the  measurement  update,  defined  in  Equation  (3.253),  is  given  as 

P(«+)  =  P(i-)  -  K(U)  H(t()  P(i-)  (3.269) 

Next,  the  state  estimator  x(tf)  =  £[x(ti)\Z(ti)]  is  propagated  to  time  U+ 1  using  Equa¬ 
tion  (3.261)  and  becomes  x(t~+1)  =  £[x(ti+i)\Z(ti)\,  where  it  is  then  evaluated  using 
the  previous  measurement  history  Z  (tfi  =  Z;  to  produce  the  realization  x.(t~+l),  hence 

x(tm)  ^  E[x(ti+1)\Z(ti)  =  Zi]  (3  27Q) 

=  ®(ti+1,ti)x(tf)  +  B d(ti)  u (U) 

The  corresponding  conditional  error  covariance,  defined  in  Equation  (3.251),  is 

P(^:+i)  =  &(ti+i,ti)P(tf)  &*(ti+i,ti)  +  Gd(ti)  Qd(^.)  G*d(tfi  (3.271) 

Note  that  tfi  denotes  the  time  just  before  incorporating  the  measurement  taken  at 
time  ti,  i.e.,  it  is  the  time  to  which  the  previous  update  is  propagated,  and  time  tf 
denotes  the  time  at  which  the  state  is  updated  after  the  measurement  was  taken. 
Thus,  the  progression  of  time  is:  t0  <  tf  <  ti  <  tf  <  tf  <  t2  <  tf  ■  ■  ■  ■ 

The  boxology  for  the  ISKF  is  shown  in  Figure  3.13;  it  simply  combines  the 
previous  boxologies  for  the  stochastic  LIMVUE  seen  in  Figure  3.11,  page  3-69,  and 
the  dynamics  model  boxology  of  Figure  3.12  on  page  3-75. 

Proof  of  Theorem  91  Equations  (3.263)  and  (3.264)  are  initialization  statements 
that  do  not  need  proving.  Similarly,  Equations  (3.265),  (3.266),  and  (3.268)  are 
meaningful  shorthand  notations  that  are  useful  quantities  to  analyze  during  engi¬ 
neering  studies  of  the  problem  at  hand.  Equations  (3.270),  (3.271),  (3.267),  and 
(3.269)  remain  to  be  proven. 
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(n,F,  p ) 


Figure  3.13  Boxology  of  the  ISKF 


We  begin  by  substituting  Equation  (3.261)  into  the  definition  for  the  estimate 
in  Equation  (3.270)  and  then  simplifying 


E[x(ti+i)\Z(ti)  =  Z  i\ 

(3.272) 

E{[&(ti+i,  U )  x(ti)  +  Bd(ti)  u (ti) 

+  Gd(tj)  wd(tj)]  Z(tj)  =  Zj} 

(3.273) 

Then, 

x(f“+1)  =  &(ti+1,ti)E[x(ti)\Z(ti)  =  Zi]  +  Bd(ti)u(ti) 

+  Gd (U)  E[wd(ti) |Z(fj)  =  Z j]  (3.274) 

=  $(tj+i,tj)x(f+)  +  Bd(ti)u(ti)  (3.275) 

where  the  second  line  follows  from  the  definition  of  x(f7)  and  the  fact  that  the 
dynamics  noise  is  (assumed  to  be)  zero-mean  and  independent  of  v  and  hence  of 
Z(fj)  as  well.  Thus  Equation  (3.270)  results  as  proposed. 

Next,  we  use  the  propagation  Equation  (3.261),  for  in  the  equivalence  of 

the  conditional  error  covariance  to  the  conditional  state  covariance  given  in  Equation 
(3.254)  to  obtain 

P(f-+1)  =  E[x(fm) \Z{U)  =  Zj]  (3.276) 

=  H[§(ti+1,ti)x(ti)  +  Bd(ti)  u(ti)  +  Gd(ti)\Nd(ti)\Z(ti)  =  Z i\  (3.277) 

Dropping  the  Bd(tj)u(t*)  term  as  it  is  known  and  thus  does  not  contribute  to  the 
covariance  and  then  expanding  yields 

P(^i)  =  S[^(ti+i,ii)  x(U)  +  Gd(ti)  wd(tj)|Z(tj)  =  Z i\  (3.278) 
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Expanding  again, 


P(t"+1)  =  H[&(ti+1,ti)x(ti)\Z(ti)  =  Zi] 

+  '£{[$(ti+1,ti)x(ti),Gd(ti)wd(ti)]\Z(ti)  = 

+  'E{[Gd(ti)\Nd(ti),®(ti+i,ti)x(ti)]\Z(ti)  =  Z,} 

+  £[Gd(*i)wd(ti)]  (3.279) 

where  the  conditioning  for  the  fourth  term  was  dropped  since  wc;  and  Z(f;_i)  are 
independent  as  previously  noted.  The  first  term  in  Equation  (3.279)  is  rewritten  in 
expectation  notation  as 

S[$(ti+i,ti)x(ti)|Z(ti)  =  Z,] 

=  E{&(ti+1,ti)[x(ti)  -  x(ft)]  o&(ti+1,ti)[x(ti)  -  x(f+)]|Z(fj)  =  Zi}  (3.280) 

=  &(ti+i,ti)E{[x(ti)  -  x(f 7 )]  o  [x(ti)  -  x(f+)]|Z(fj)  =  Z i}$*(ti+i,ti)  (3.281) 

=  $(fm,ti)S[x(fi) \Z(U)  =  Z  i]&*(ti+1,U)  (3-282) 

=  $(fm,fjP(f+)$*(fm,fJ)  (3.283) 

where  the  second  equality  employs  Lemma  43,  from  page  3-26,  and  lines  three  and 
four  follow  from  definitions  for  a  covariance  operator  and  then  the  conditional  error 
covariance  operator.  The  second  and  third  terms  of  Equation  (3.279)  are  cross¬ 
covariance  terms  for  independent  random  vectors,  with  at  least  one  being  zero-mean, 
and  are  thus  zero.  The  fourth  term  of  Equation  (3.279)  is  expanded  using  the 
expectation  notation,  while  noting  that  wd  is  a  zero-mean  stochastic  noise  process, 
to  obtain 


£[Gd(ti)wd(ti)]  =  E[Gd(ti)  wd(tj)  o  Gd(ti)  wd(L)]  (3.284) 

=  Gd(ti)E[wd(ti)owd(ti)]Gd(ti)  (3.285) 

=  Gd(ti)  Qd{ti)  G*d(ti)  (3.286) 
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where  Lemma  43  is  used  to  produce  the  second  equality  and  then  Qd  is  simply  the 
covariance  of  wd  defined  in  Equation  (3.225).  Therefore,  with  each  of  the  terms 
systematically  addressed,  Equation  (3.279)  becomes  Equation  (3.271). 

The  updated  state  estimator,  x(t)1"),  in  Equation  (3.267)  and  the  conditional 
error  covariance,  P(tf),  in  Equation  (3.269),  are  estimators  for  the  state  and  the 
conditional  error  covariance  for  the  stochastic  LIMVUE  given  in  Theorem  78,  as 
noted  in  the  remark  following  the  theorem,  for  the  ith  measurement  Z (tf)  =  Zj.  ■ 

As  reported  earlier,  the  state  and  measurement-corruption  noise  were  assumed 
to  be  independent  processes.  The  following  lemma  shows  why  this  is  true  for  the 
discrete-time  case. 

Lemma  92  The  state,  x(U),  and  measurement- corruption  noise,  v(tj),  are  indepen¬ 
dent  for  all  time  U,  tj  €  T. 

Proof  of  Lemma  9224  Recall  Equation  (3.224)  from  the  definition  of  the  dynamics 
model  (where  we  have  decremented  all  of  the  time  indices): 


x(ti)  =  $(L,b-i)x(U-i)  +  Bd(ti_1)u(ti_1)  +  Gd(ti_i)wd(ti_1)  (3.287) 


Using  this  equation,  we  substitute  in  for  x(£j_!)  and  we  get 


x(ti)  =  &(ti,  U-i)  [$(ti_i,  ti_2)  x(L_2)  +  Bd(L_2)  u(ti_2) 

+  G d(U_2)  w d(b-2)]  +  Bd(U_i)  u(U_i)  +  Gd(L-i)  wd(U_i)  (3.288) 


which  can  be  written  as 


x(ti)  =  &(ti,ti-2)x(ti-2)  +  $(U,L-i)  [Bd(ti_2)u(ti_2) 


+  Gd(L_2)  wd(L_2)]  +  Bd(U_i)  u(L_i)  +  Gd(U-i)  wd(U_i)  (3.289) 


24 


The  proof  of  this  lemma  closely  follows  Maybeck  [129]  for  finite-dimensional  random  vectors. 
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Now  substitute  in  for  x(fj_2)  to  continue  the  pattern 


x(ti)  =  $(£,:,  t;_2)[$(ti_2,ti-3)x(f;_3)  +  Bd(ti_3)  u(ti_3) 

+  Gd(tj_3)  Wd(tj_3)]  +  [Bd(tj_2)  u(tj_2) 

+  Gd(tj_2)  Wd(tj— 2)]  +  Bd(ti_i)  u(fj_i)  +  Gd(tj-i)  Wd(ti-i)  (3.290) 

Simplifying, 


X(ti)  =  &{ti,ti-3)x(ti-3) 


+  $(ti,ii-2)[Bd(ii-3)u(ti_3)  +  Gd(ti-3)  wd(ti_3) 


+  *&(U,  ti- 1)  [Bd(tj_2)  u(tj_2)  +  Gd(tj_2)  Wd (tj_2) 


+  Bd(tj-i)  u(tj_i)  +  Gd(tj-i)  Wd(ii_i) 


(3.291) 


The  nested  pattern  is  now  clear,  and  thus  we  write  Equation  (3.291)  as 

i 

x(tj)  =  &(ti,t0)x(t0)  +  &(tj,  ffc)[Bd(tfc-i)  u(4-i)  +  Gd(tfe-i)  Wd(tfe-i)]  (3.292) 

k= 1 

where  we  used  the  fact  that  <&(fj,fj),  for  any  time  U,  is  equivalent  to  the  identity 

operator.  Since  v (£_,-)  is  independent  of  each  of  the  terms  in  Equation  (3.292),  x(fj) 
and  y(tj)  are  mutually  independent  random  vectors  for  all  time  t, ,  tj  G  T.  ■ 

Generalized  Infinite-Dimensional  Multiple  Model  Adaptive  Estimation 

The  system  description  entails  a  detailed  accounting  of  all  of  the  parameters 
used  in  the  structure  of  the  models  and  the  statistics  describing  the  dynamics  and 
measurement  noises.  Specifically,  the  system  is  determined  by  knowledge  of  the  true 
values  for  <h  Bd,  Gd,  H,  Qd,  R,  x0,  and  Pn.  When  a  subset  of  the  model  parameters 
are  uncertain,  we  can  characterize  this  subset  of  uncertain  parameters  as  stochastic 
processes.  For  this  research,  we  restrict  ourselves  to  a  subset  of  these  parameters, 


3-92 


and  since  their  values  are  uncertain  (but  assumed  constant),  we  express  them  in 
terms  of  the  components  of  a  vector  random-constant  stochastic  process  a(-,  •).  The 
random-constant  stochastic  process,  indexed  by  the  times  T,  is  a  constant  random 
vector  for  all  times  tt  G  T,  i.e.,  a (f,:,  •)  =  a (tj)  G  A  and  for  a  given  u  G  fi,  the 
realization  is  a(ti,u )  =  a  G  A,  which  is  independent  of  the  time  index  since  it  is 
assumed  to  be  a  constant  for  all  time.  More  general  stochastic  process  models  than 
random-constant  processes  can  allow  the  parameter  to  be  time- varying,  and  this  can 
give  rise  to  an  interactive  multiple  model  (IMM)  rather  than  an  MMAE  algorithm, 
as  discussed  in  Section  2.4.7,  page  2-38.  Section  2.3.3. 1  has  further  information  for 
the  finite-dimensional  case. 

Note  that  the  form  of  the  equations  for  the  elemental  filters  and  the  state  and 
parameter  estimates  look  exactly  the  same  as  those  equations  appearing  in  Sections 
2. 3. 3. 3  in  2.3.4.  However,  they  are  not  strictly  the  same  since  in  this  chapter  we 
are  dealing  with  the  more  general  case  of  vectors  in  a  Hilbert  space,  i.e.,  they  may 
be  infinite-dimensional  vectors,  and  the  matrices  for  the  finite-dimensional  case  are 
now,  in  general,  allowed  to  be  transformations  —  thus,  we  have  the  generalized 
infinite- dimensional  multiple  model  adaptive  estimation  (GIMMAE). 

3. 6. 1  Elemental  Filters.  Each  elemental  filter  in  the  bank  is  based  upon  a 
different  hypothesis  for  the  parameter  values  used  to  model  the  real  world  system, 
i.e.,  the  kth  elemental  filter  design  model  is  constructed  assuming  that  a (t*)  =  afc. 
The  discrete-time  model  equations  for  the  kth  elemental  filter  are 

^fc(^i+l)  (^i+1  j  ti)  (ti)  T  B(j^,(fj)  ll(ij)  4  Gdfc(G  ^di;(h)  (3.293) 

z[ti)  =  Hfc(fj)  xfc(fj)  +  vfc(fj)  (3.294) 

where  the  properties  of  $k(ti+1,ti),  B dfe(fj),  G dfe(f;),  Hfe(^),  Q dfc(A),  and  R k(ti)  were 
discussed  in  the  previous  sections. 

The  correctness  or  validity  of  each  hypothesis,  a  (ti)  =  a*,,  is  ordinarily  ob¬ 
tained  through  an  analysis  of  the  filter  residuals,  the  difference  between  the  ob- 
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served  measurement  and  the  predicted  measurement,  r k(ti)  =  z,:  —  Hfc(£j)  xfc(£“) 
[129].  This  “correctness”  information  is  also  coded  in  the  hypothesis  conditional 
probability  PkiU),  which  is  defined  as  the  probability  that  a (fj)  assumes  the  value 
afc  (for  k  —  1,2, ,  K),  conditioned  on  the  observed  measurement  history  to  time 
U  [130,  132] 

Pk(U )  =  pr{a(fj)  =  afc|Z(fj)  =  Z;}  (3.295) 

such  that 

K 

Pk{U )  >  0  for  all  k  and  Pk{U )  =  1  (3.296) 

k= 1 

A  close  inspection  of  Equations  (2.44)  through  (2.46),  specifically,  Equation 
(2.45),  on  page  2-30,  shows  that  the  PDF  for  the  Gaussian  distributed  random 
vector  does  not  exist  on  a  general  Hilbert  space  since  letting  measurement  dimension 
m  tend  to  infinity  results  in  (3  equal  to  zero;  hence  we  would  have  /  =  0,  which  is  not 
a  proper  PDF25.  However,  it  is  interesting  to  note  that  the  hypothesis  conditional 
probabilities  calculated  using  Equation  (2.43),  found  on  page  2-29,  are  independent 
of  m,  since  it  factors  out  of  both  the  numerator  and  the  sum  of  K  terms  in  the 
denominator.  So,  assuming  that  the  initial  probability  Pk(to)  for  all  k  is  known  or 
well  modeled,  for  example,  as  Pk(to )  =  1  /K  for  k  =  1, ...  ,K,  hypothesis  conditional 
probabilities  are  determined  as 


Pkiti) 


/z(ii)|a(tj),Z(tj_i)(zi|afc)  Zj_i)  Pk(ti- 1) 
Sj=l  /z(ti)|a(tj),Z(q_i)(zi|aj)  Zj_i)  Pjiti- 1) 


(3.297) 


This  is  an  iteration  expressed  in  terms  of  the  previous  values  Pk{ti-i),  where  the 
scaled  conditional  probability  “density”  function,  as  denoted  by  /,  is  a  Gaussian- 
likc  function  with  “mean”  H /,;(tj)  xy,(f“)  and  “covariance”  (operator)  A k(U) 


fz(ti)\a(ti),z(ti-i) (z* lafc>  2j_i)  /3fc(fj)  exp  |  2 L/k(ti)}  (3.298) 


25We  defined  the  Gaussian-distributed  random  vector  using  the  characteristic  equation,  see  page 
3-32,  because  the  PDF  is  ill-defined  for  the  infinite-dimensional  case. 
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where  the  modified  scale  factor  is  now 


k(ti)  ~  dfiir  (3299) 

and  where  the  likelihood  quotient  in  Equation  (3.298),  which  is  a  measure  of  the 
“correctness”  of  the  parameter  values  for  this  particular  model  [130],  is  the  weighted 
inner  product 

Lk{ti)  =  {Yk(ti),A~l(ti)rk(ti))  (3.300) 

where  r k{ti)  =  z*  —  Hfc(tj)  xfe(t~)  and  A k{ti)  are  the  residual  and  associated  residual 
covariance  calculated  by  the  kt\i  Kalman  filter  as  in  Equations  (3.268)  and  (3.265), 
respectively.  Note  that  using  /  (which  is  not  a  true  PDF  since  the  “volume”  un¬ 
der  the  /  function  “surface”  is  not  unity)  rather  than  /  (which  is  ill-defined  in  the 
infinite- dimensional  case)  has  no  impact  on  the  operation  of  the  MMAE.  Further¬ 
more,  the  denominator  in  Equation  (3.297)  is  simply  the  sum  of  all  K  numerators, 
and  it  is  thus  the  appropriate  scale  factor  to  guarantee  that  the  pk(ti )  values  so 
generated  will  always  sum  to  one.  Finally,  note  that  (3k(ti)  is  just  a  scale  factor  and 
that  the  most  important  information  to  be  retrieved  from  this  “density”  function 
is  contained  in  Lk(ti)\  hence  the  fact  that  we  do  not  have  true  PDFs  in  Equation 
(3.297)  in  the  strict  sense  is  not  problematic. 

It  has  been  shown,  for  the  finite-dimensional  case  [94,  129],  that  the  sequence 
of  residuals  {rfc(f*)}  resulting  from  linear  filtering  forms  a  zero-mean  white  Gaussian 
sequence  with  known  residual  covariance  A k(ti).  Thus,  if  a  filter  model  matches 
the  “true”  system,  then  the  residual  r k(ti)  should  be  a  zero-mean  white  Gaussian 
process  with  known  residual  covariance  A k{ti). 

3.6.2  State  and  Parameter  Estimates.  This  estimation  technique  uses  the 
information  in  all  of  the  Kalman  filter  residuals  to  estimate  the  “true”  parameter 
vector  in  effect.  This  technique  is  optimal  when  there  is  a  unique  filter  for  each  pos- 
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sible  combination  of  parameter  values,  which  is  only  possible  when  the  parameter(s) 
of  interest  takes  on  just  a  finite  number  of  possible  values.  We  shall  populate  the 
filter  bank  with  K  filters,  each  based  on  one  of  the  K  unique  parameter  vectors: 
I3!,  32,  •  •  • ,  3k}- 

From  the  Bayesian  point  of  view,  the  MMAE  framework  can  be  used  to  com¬ 
pute  a  state  (or  parameter)  estimate  that  is  characterized  by  minimizing  the  MSE 
between  the  predicted  and  measured  state  estimates;  this  is  most  often  called  an 
MMSE  estimate  and  is  the  conditional  mean.  We  identify  the  Bayesian  estimate  as 
the  standard  MMAE  estimate  and  write  it  as  [130,  132] 

K 

xmmae(^)  =  E{x(ti)\Z(ti)  =  Z i}  =  YM4)  Pk(ti)  (3.301) 

k=  1 

where  x*.(td)  is  the  state  estimate  generated  by  the  kth  Kalman  filter  based  on  the 
assumption  that  the  parameter  vector  a (ti)  =  a*.  for  all  f*.  The  conditional  covariance 
of  x(ti)  computed  by  the  MMAE  is  given  by  Equation  (2.50)  for  finite-dimensional 
systems  [130]  and  for  infinite-dimensional  systems  is 

P  mmae(^  ) 

—  E  {[x(tj)  —  XMMAE(t)1’)]  O  [x(tj)  —  XMMAE(t)1")]  |Z(tj)  =  Z (3.302) 
I< 

=  ^  (Pfc(^+)  +  -  XMMAE(t,+  )]  O  [Xfc(t+)  -  XMMAE(t*+)]}pfc(ti)  (3.303) 

fc= 1 

where  P k(tf)  is  the  state  error  covariance  computed  by  the  kth  Kalman  filter.  Sim¬ 
ilarly,  the  parameter  estimate  is  given  by 

K 

aMMAE {tj)  =  E{a(ti)\Z(ti)  =  Zi}  =  Pk(U)  (3.304) 

k= 1 

with  conditional  covariance  of  a  (ti)  for  finite- dimensional  systems  [129]  as  given  in 
Equation  (2.54),  and  adapted  for  infinite-dimensional  systems  as 
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P  a,MMAE(ij+  ) 


—  E  { [a (tj)  —  smmae(^ )]  o  [a(U)  ~  %MAE(ij+)] |Z(£j)  —  Zjj  (3.305) 

K 

=  ^[afc  —  %MAE(i,+  )]  <>  [afc  —  %MAE(i,+  )]  Pk(ti)  (3.306) 

k=  1 


3. 7  Summary 

The  early  part  of  this  chapter  focused  on  defining  various  mathematical  con¬ 
cepts  needed  to  construct  the  linear  infinite-dimensional  minimum  variance  unbiased 
estimator  (LIMVUE)  rigorously  —  this  is  a  central  tool  in  the  building  of  the  infinite¬ 
dimensional  sarnpled-data  Kalman  filter  (ISKF).  Along  the  way,  we  introduced  the 
illustrative  boxology  technique  that  we  use  to  convey  the  ISKF  development,  from 
defining  the  primary  probability  space  to  the  spaces  occupied  by  the  random  vectors, 
to  the  probability  spaces  induced  by  the  random  vectors  representing  the  noises,  the 
observations,  and  the  state.  We  generalized  the  LIMVUE  for  stochastic  processes 
to  create  a  stochastic  sequential  estimator.  Since  the  majority  of  the  problems 
that  we  study  are  described  by  infinite-dimensional  continuous-time  models,  we  ex¬ 
tended  the  known  finite-dimensional  method  for  creating  an  equivalent  discrete-time 
model  from  its  corresponding  continuous-time  model  for  infinite-dimensional  models. 
The  dynamics  model  provides  us  with  a  tool  to  propagate  the  Gaussian  conditional 
state  “density”  between  measurements,  the  first  two  moments  of  which  are  esti¬ 
mated  optimally  using  the  stochastic  LIMVLIE.  Then,  we  assembled  the  pieces  to 
form  the  ISKF,  thus  completing  the  array  of  filtering  techniques  that  began  with 
Kalman’s  first  (discrete-time)  Liter  [95]  and  the  continuous-time  Kalman-Bucy  Li¬ 
ter  [96]  shortly  thereafter;  subsequently,  Falb’s  [51]  extension  of  the  Kalman-Bucy 
Liter  (for  continuous-time  measurements)  to  encompass  systems  with  an  inhnite- 
dimensional  continuous-time  description  by  developing  it  on  a  Hilbert  space.  The 
chapter  closed  with  a  short  discussion  and  report  on  the  modihed  formulae  for  the 
multiple  model  adaptive  estimation  (MMAE)  methodology  as  applicable  to  a  bank 
of  elemental  ISKFs  —  the  GIMMAE. 
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IV.  An  Example:  The  Stochastic  Heat  Equation 


4-1  Introduction 

The  state  space  model  used  in  this  stochastic  estimation  research  is  based  on  a 
pair  of  mathematical  expressions:  a  state  equation  that  defines  the  evolution  of  a  dy¬ 
namic  process  through  time  and  a  measurement  equation  that  defines  the  observation 
process.  In  general,  these  equations  may  be  either  stochastic  or  deterministic;  we 
shall  investigate  the  stochastic  case.  The  state  space  model  may  be  based  on  differ¬ 
ential  or  difference  equations;  our  work  encompasses  both  varieties.  When  presented 
with  a  continuous-time  model  featuring  differential  equations  we  shall  re-express 
them  using  their  equivalent  discrete-time  difference  equations.  Finally,  while  the 
dimension  of  the  state  space  model  is  allowed  to  be  infinite  for  theoretical  purposes, 
for  computational  purposes,  we  must  have  finite-dimensional  equations.  Therefore, 
we  shall  present  a  straight-forward,  yet  novel,  method  for  reposing  the  problem  on 
a  finite- dimensional  subspace. 

In  this  chapter  we  will  apply  the  theoretical  methods  developed  in  the  preceding 
chapter  to  a  physically  meaningful  problem.  We  will  estimate  the  temperature  along 
a  slender  cylindrical  rod  modeled  by  the  stochastic  heat  equation,  a  parabolic  partial 
differential  equation  (PDE),  using  noise-corrupted  finite-dimensional  measurements1. 
This  example  is  a  special  case  of  the  general  theory  developed  in  Chapter  III  since 
our  state,  the  temperature,  is  a  scalar,  while  the  observations  are  recorded  in  a 
finite-dimensional  vector  of  scalars.  Many  textbooks  on  PDEs,  such  as  Berg  and 
McGregor  [20]  written  for  mathematicians,  Farlow  [52]  for  scientists  and  engineers, 
and  Gockenbach  [62]  for  numerical-computational  scientists  and  engineers,  contain 
an  exposition  on  the  heat  equation.  However,  all  of  these  texts  solve  the  deterministic 
problem  as  if  the  model  were  exact.  Phillipson  [161]  also  solved  the  deterministic 

lrThis  example  was  inspired  by  Example  5.39  in  Curtain  and  Pritchard  [38]. 
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problem,  but  he  employed  a  least  squares  approach  that  admitted  that  there  was 
some  uncertainty  in  the  temperature  (or  state  of  the  system).  He  employed  the 
Galerkin  method2  to  find  two  approximate  solutions  for  the  state  expressed  as  a 
linear  combination  of  the  eigenfunctions  of  the  system  and  another  useful  ad  hoc 
approach  using  a  linear  combination  of  cubic  splines  to  speed  np  convergence  [161]. 

In  a  recent  paper,  Lcland  [116]  presented  a  method  for  treating  parabolic  PDEs 
(such  as  the  stochastic  heat  equation  problem)  using  a  maximum  likelihood  estimator 
for  the  parameter  of  interest,  such  as  the  thermal  diffusivity.  He  used  an  approximate 
time-invariant  one-step  predictor  to  avoid  solving  state  and  covariance  equations; 
additionally,  his  system  was  presumed  to  be  in  steady  state.  This  paper  by  Leland  is 
representative  of  the  other  papers  reviewed  during  this  research  in  that  it  does  not 
include  a  measurement  model  to  take  into  account  the  measurement  process. 

In  this  research,  we  employ  an  evolution  equation  to  model  the  time-varying 
dynamics  of  the  system  in  question  and  a  measurement  model  in  order  to  estimate 
the  state  of  the  system  optimally.  As  we  shall  soon  demonstrate,  the  method  that 
we  employ  to  create  the  essentially-equivalent  finite- dimensional  discrete-time  model 
from  the  infinite-dimensional  continuous-time  model  allows  us  to  use  the  infinite¬ 
dimensional  samplcd-data  Kalman  filter  (ISKF)  without  additional  approximations. 
The  resulting  algorithm  looks  and  behaves  like  the  finite-dimensional  sampled-data 
Kalman  filter  that  was  reported  in  Chapter  II. 

4-2  Mathematical  System  Model 

Creation  of  the  mathematical  system  model  is  the  first  step  in  model-based  esti¬ 
mation.  We  have  used  familiar  models  for  the  dynamics  and  measurement  processes 
for  two  primary  reasons:  (1)  to  emphasize  the  applicability  of  the  theory  devel¬ 
oped  in  the  previous  chapter  and  (2)  to  illustrate  the  techniques  developed  in  this 

2Numerous  texts  contain  a  paragraph,  section,  or  chapter  devoted  to  the  explication  of  the 
Galerkin  method  used  to  solve  PDEs  [89,  30,  62]. 
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chapter  to  transform  an  infinite-dimensional  problem  into  an  essentially-equivalent 
finite-dimensional  problem  that  one  can  easily  implement  on  a  digital  computer. 


4-2.1  Preliminary  Background:  The  Heat  Equation.  The  purpose  of  this 
extended  example  is  to  demonstrate  an  exact  method  for  employing  the  ISKF  to  es¬ 
timate  the  temperature  profile  along  the  length  of  a  slender  cylindrical  rod  over  time. 
Let’s  begin  by  describing  the  deterministic  heat  equation  with  Neumann  boundary 
conditions  in  one  dimension  augmented  by  a  variable  heat  source  [52]: 


d 

dt 

d_ 

dp 


x{t,p) 
x(t,  0) 


l) 

z(0  ,p) 


d2 


x(t,p)+u(t,p), 


0  <  p  <  1, 


0  <t<oo 


0,  0  <  t  <  oo 
0,  0  <  t  <  oo 

x0(p),  o  <  p  <  1 


(4.1) 

(4.2) 

(4.3) 

(4.4) 


where  x  is  the  temperature  in  degrees  Celsius  (°C),  t  is  the  time  in  seconds,  k>  0 
is  the  thermal  diffusivity  constant  of  the  material  in  square  meters  per  second,  the 
heat  source  is  at  u  °C,  and  the  position  along  the  rod  is  indicated  by  p.  Additionally, 
the  rod  has  an  initial  temperature  of  xo(p)  and  is  laterally- insulated  with  insulated 
ends  (boundaries).  Now  let’s  add  a  random  component  to  the  heat  source  —  this 
also  makes  the  temperature  random  —  then  Equations  (4.1)  through  (4.4)  become 


d 

— x 
dt 

d 


(t,p) 

(t,  0) 


k(t’l) 

x(0,  p) 


d2 

n-—x(t,p)  +  u(t,p)  +  w(t,p),  0  <  p  <  1,  0  <  t  <  oo 
dp- 


0,  0  <  t  <  oo 

0,  0  <  t  <  oo 
x0(p),  0  <  p  <  1 


(4.5) 

(4.6) 

(4.7) 

(4.8) 


where  w  is  a  zero-mean  white  Gaussian  noise  process  with  strength  Q  and  xq  is  a 


4-3 


Gaussian  random  variable  with  mean  xq  and  covariance  Pq.  Note  that  Equations 
(4.5)  through  (4.8)  only  represent  the  stochastic  heat  equation  in  a  formal  sense  since 
the  solution  is  not  properly  defined  in  this  “white-noise”  continuous-time  notation. 
A  properly  defined  model  was  given  in  Section  3.4  and  we  will  use  those  results 
in  our  example  to  describe  properly  the  mathematical  system  model  used  for  esti¬ 
mating  the  temperature  (state)  of  the  slender  cylindrical  rod  with  noise-corrupted 
measurements. 

4-2.2  Model  Mapping.  In  Chapter  I  we  introduced  a  concept  for  mapping 
an  infinite- dimensional  continuous-time  model,  a  projection  of  the  real  world  onto  a 
linear  systems  mind-set,  to  the  equivalent  infinite-dimensional  discrete-time  model; 
the  infinite-dimensional  sampled-data  Kalman  filter  (ISKF)  described  in  Section  3.5 
was  derived  specifically  for  this  case.  We  continued  by  projecting  this  equivalent 
model  onto  a  finite-dimensional  subspace  to  produce  an  essentially- equivalent  finite¬ 
dimensional  discrete-time  model  that  we  can  use  to  design  an  optimal  filter  with 
which  to  estimate  the  state  —  by  means  of  the  sampled-data  Kalman  filter.  Figure 
4.1,  an  important  part  of  Figure  1.1,  gives  an  overview  of  the  process.  In  this 
section,  we  map  the  infinite-dimensional  continuous-time  model  to  the  equivalent 
infinite- dimensional  discrete-time  model,  i.e.,  we  execute  the  optimal  discretization 
of  the  time  variable,  the  conceptual  Topt  operation,  using  the  technique  and  results 
presented  in  Section  3.4.  In  the  next  two  sections  we  prepare  the  model  for  the 
Kalman  filter  by  projecting  our  equivalent  infinite-dimensional  discrete-time  model 
onto  a  finite-dimensional  subspace;  with  this  conceptual  5opt  operator  we  obtain  the 
essentially-equivalent  finite-dimensional  discrete-time  model.  Then  we  construct  the 
conceptual  jFopt  by  determining  the  finite- dimensional  components  of  the  sampled- 
data  Kalman  filter  in  Section  4.3.  In  Chapter  V  we  use  the  results  from  this  chapter 
to  simulate  the  flow  of  heat  through  the  slender  cylindrical  rod  and  to  estimate  the 
temperature  using  noisy  measurements. 
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Figure  4.1  Mapping  the  Infinite-Dimensional  Continuous-Time  Model  to  an 
Essentially-Equivalent  Finite-Dimensional  Discrete-Time  Model. 


4-2.3  Discrete-Time  Measurement  Model.  Let  the  following  integral  repre¬ 
sent  the  operation  of  a  sensor  collecting  information  about  the  temperature  of  the 
rod  at  time  tt 

z (U)  =  (  x(ti ,  p)  dp  +  v(ti)  (4.9) 

Jo 

where  z,  x,  and  v  are  (scalar)  stochastic  functions  in  Hilbert  spaces  of  random  vari¬ 
ables.  However,  this  model  would  only  allow  us  calculate  the  average  temperature  of 
the  slender  cylindrical  rod.  So  we  propose  to  partition  the  integral  into  M  segments 
to  represent  a  set  of  M  sensors  positioned  along  M  equi-length  sections  of  the  rod 

pi  pl/M  pi 

/  x(tj,  p)  dp  =  /  x(ti,  p)  dp  4 - b  /  x(ti,p)dp  (4.10) 

Jo  Jo  J  (M— 1)/M 


If  we  were  to  evaluate  Equation  (4.10)  directly,  then  we  would  lose  all  of  the  spatial 
information.  So,  in  order  to  preserve  the  spatial  information,  we  propose  to  stack  the 
M  integrals  in  an  observation  vector  to  represent  the  contributions  of  M  individual 
sensors 


Io/M  x(ii,  p)  dp 


Au)  = 


+  y(ti)  =  Hx(lj)  +\i(ti)  (4.11) 


f(M—l)/M  P )  JP 
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where  z (ti)  is  a  random  M-vector  of  observations  at  time  L  of  the  temperature  x(tj,  p) 
as  a  function  of  position  p,  H  is  a  linear  transformation  defined  by  the  vector  of  M 
integrals  acting  on  the  state,  and  v(L)  is  a  random  noise  vector.  More  specifically, 
the  random  vector  z (£*)  G  Z  =  L2(fi,  P;  Z)  is  a  Lebesgue  L2  function3  (in  a  separable 
Hilbert  space  of  functions)  that  maps  the  sample  space4  Q,  to  the  realization  space 
Z  =  Mm,  as  shown  in  Figure  4.2,  while  the  scalar  random  variable  x(f*)  G  X  = 
L2(fi,  P;  X),  which  is  also  a  Lebesgue  L2  function,  maps  the  sample  space  to  another 
separable  Hilbert  space  of  Lebesgue  L2  functions  on  the  interval  0  <  p  <  1  written 
as:  X  =  L2([0, 1],  M).  We  sometimes  substitute  the  shorthand  L20  ^  for  the  more 
explicit  notation  L2([0, 1] , M.) .  A  realization  of  the  observation  at  time  is  labeled 
Zj  G  MM.  The  zero-mean  Gaussian  noise  process  v  has  covariance  matrix  R (ti)  at 
time  ti  and  covariance  kernel 

I  R(fj),  ti  =  tj 

E  [v(^)  v  (tj)]  =  S  (4-12) 

[0,  U  ^  tj 

with  R(fj),0  G  MMxM,  thus  v  is  also  a  white  process. 

4-2-4  Continuous-Time  Dynamics  Model.  The  temperature  distribution 
of  a  slender  cylindrical  rod  is  well  modeled  by  a  scalar  heat  equation  with  additive 
noise.  Thus,  we  begin  by  writing  the  scalar  heat  equation  as  a  stochastic  differential 
equation  in  differential  form  (versus  the  familiar  derivative  form)  to  guarantee  the 

3Recall  from  Chapter  III  that  the  Lebesgue  L2  functions  form  a  Banach  space  with  finite  norm 

r  1  W2 

||a;(tj)||  =  Jo  \x{U,  p)\2dp  as  defined  in  an  example  on  page  3-7.  Furthermore,  the  L2  functions 

are  absolutely  square  integrable.  When  associated  with  an  inner  product,  the  L2  Lebesgue  functions 
form  a  Hilbert  space  [122].  We  need  the  completeness  of  a  Hilbert  space  to  assure  ourselves  that 
all  of  its  Cauchy  sequences  converge  to  a  limit  in  the  space  and  thus  any  sequence  of  random 
vectors  will  also  converge  to  a  limit  within  the  space  [154],  An  example  on  page  3-27  gives  the 
interpretation  of  the  norm  for  Lebesgue  functions  representing  the  mapping  induced  by  the  random 
vectors. 

definition  41,  on  page  3-24,  defines  the  complete  probability  space,  denoted  by  the  triplet 
(f that  we  employ  in  this  research,  where  H  is  a  non-empty  set  called  the  sample  space,  T 
is  a  (7-field  which  consists  of  all  of  the  subsets  of  H,  called  events,  and  P  is  the  probability  measure. 


4-6 


Z  =  L2(ft,P;  Z) 

I 

z  (U) 


(Z,»(Z  ),PZ 


• 

z-.n^z 

• 

OJ 

Random  Vector  Space 

Z  i 

Probability  Space  Realization  Space 

Figure  4.2  Boxology  of  the  Random  Measurement  Vector. 

existence  and  uniqueness  of  the  solution  [38],  just  as  we  did  in  Equation  (3.223)  on 
page  3-72 

dx(£,  p)  =  [F(p)  x(t,  p)  +  B  u(t,  p)]dt  +  db(f) 

(4.13) 

x(0,  p)  =  x0(p) 

where  x  is  the  stochastic  temperature,  u  is  a  heat  source,  b  is  a  noise  process  described 
at  the  end  of  this  subsection,  B  —  1  is  a  constant  input  distributor5,  and  F  is  a  time- 
invariant  second-order  partial  differential  operator.  F  is  defined  for  every  x  G  D(P) 
as 

d2V 

(4.14) 


d2x 

Fx  =  k—t  for  k  G  (0,  oo),  p  G  (0, 1) 
op1 


for  a  domain  described  by 

V(F)  =  p  e  X  :  Xp,  xpp  e  X;  xp(0)  =  0  =  xp 


(4.15) 


and  k  is  the  material  thermal  diffusivity  constant.  Since  the  stochastic  temperature 
x(t,p),  for  0  <  p  <  1  is  a  continuous  function  over  [0,1],  it  is  a  member  of  an 
infinite- dimensional  vector  space6.  The  stochastic  state  process,  x,  is  a  collection  of 


5 We  have  retained  B  to  demonstrate  how  it  flows  through  the  development. 

6An  infinite-dimensional  linear  space  requires  an  infinite  number  of  basis  vectors  (in  this  case 
functions)  to  span  the  space.  For  example,  the  set  of  functions  y(p)  defined  on  the  interval  [0, 1] 
requires  an  infinite  number  of  basis  functions  to  span  the  space.  Thus,  the  scalar  function  y{p)  has 
an  infinite  dimension. 
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random  variables  (x(f)  :  t  G  [foGf]}-  The  random  variable  x(t)  G  X,  is  the  same  as 
defined  in  the  previous  section,  thus,  X  =  L2(f2,  P;  L2([0, 1],  M)).  The  noise  process7 
b  is  an  X- valued  Wiener  process8,  described  in  Definition  61,  with  a  time-invariant 
(constant)  diffusion  Q  G  M.  The  initial  state  xo  is  an  X-valued  Gaussian  random 
process,  see  Definitions  59  and  51,  with  covariance  operator  P0  G  BCO(K),  where 
BCO(K)  denotes  the  linear  space  of  bounded  linear  operators  (BLOs)  on  X. 

4-2.5  Equivalent  Discrete-Time  Dynamics  Model.  The  equivalent  discrete¬ 
time  dynamics  model  for  our  scalar  heat  equation  given  in  Equation  (4.13)  follows 
from  Equation  (3.224)  given  on  page  3-73: 

x(ti+i)  =  $(ti+ 1  -  U)  x(f j)  +  Bd(U )  u(U)  +  w d(ti)  (4.16) 


where  $(tj+i  —  tt)  is  the  state  transition  operator  that  we  will  define  after  we  first 
discuss  the  equivalent  discrete-time  input  distributor,  Pd, 


Bd{ti) 


H+l 


$(*i+i 


B  ds 


(4.17) 


provided  u{t)  is  a  piece-wise  constant  function  that  is  constant  over  each  sample 
period,  G d(fj)  is  not  shown  explicitly  since  it  is  an  identity  operator,  and  the  equiv¬ 
alent  discrete-time  dynamics  noise  process  wd  is  a  zero-mean  white  Gaussian  defined 
at  each  time  t*  by 

w d(ti)  =  [  $(ti+i  -  s)db(s)  (4.18) 


7In  general,  this  noise  process  would  be  both  a  function  of  time  and  space;  however,  to  make 
this  example  more  tractable,  it  is  assumed  to  be  a  time-varying,  spatially-invariant  function. 

8The  Wiener  process  is  also  known  as  Brownian  motion  (and  thus  the  choice  of  notation  b);  see 
Definition  61  on  page  3-39.  For  an  interesting  discussion  on  the  Brownian  motion  process,  see  for 
example,  the  book  by  Wiener,  et  al.  [209].  In  the  derivative  form  of  Equation  (4.13),  the  db /dt 
would  represent  the  additive  white  noise  -  this  is  not  the  same  as  a  Wiener  process.  Heuristically, 
we  often  treat  the  hypothetical  time  derivative  of  the  Wiener  process  db /dt  as  white  Gaussian  noise. 
In  short,  do  not  confuse  this  Brownian  motion  b  with  a  white  Gaussian  noise  process! 
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with  equivalent  discrete-time  dynamics  noise  covariance  kernel  operator, 
S[wd(tj),  wa(tfc)]  =  0,  for  ti  f  tk,  and  covariance  operator  whenever  tn  =  tk- 


rU+i 

S[wd(tj)]  =  /  -  s)Q  1  -  s)ds  =  Qd(ti)  (4.19) 

Ju 


The  state  transition  operator  $  is,  in  general,  a  function  of  both  time  arguments, 
but  here  it  is  a  function  of  a  single  parameter  (the  time  difference)  since  F  is  time- 
invariant  as  can  be  seen  in  Equation  (4.14),  and  is  defined  as9 


[<f>(f  —  t0)  xo](p)  =  ^  e  Kn2,r2h  cos (mrp)  /  x0(p/)  cos(mrp')  dp'  (4.20) 


For  a  discrete  set  of  times,  T,  the  state  transition  operator  $  becomes 


[$(ti+1-ti)x(ti)](p)  =  e  Kn2^di+ 1  *dcos (unp)  /  x(ti,p')cos(mrp')dp'  (4.21) 


4-2.6  Equivalent  Infinite- Dimensional  Discrete-Time  Model.  To  summa¬ 
rize,  we  shall  employ  the  results  from  Chapter  III  as  a  template  for  composing  the 
dynamics  and  measurement  model  equations  for  this  stochastic  heat  equation  exam¬ 
ple.  The  results  in  Section  3.4  enable  us  to  write  down  the  important  equations  for 
the  dynamics  model  that  we  have  formally  written  in  the  introduction  to  this  chap¬ 
ter.  Since  we  want  to  use  the  ISKF  to  estimate  the  temperature  optimally  along  the 
slender  cylindrical  rod,  we  require  that  our  measurement  model  match  the  form  of 
a  generalized  stochastic  measurement  model  as  given  in  Definition  76.  We  satisfied 
this  requirement  with  a  vector  measurement  model  described  in  Section  4.2.3.  Note 
that  this  vector  measurement  process,  z,  is  finite-dimensional  —  a  special  case  of 
the  general  theory  developed  in  Chapter  III  which  will  adequately  serve  to  illustrate 

9It  is  not  a  simple  process  to  find  the  state  transition  (or  evolution)  operator  for  a  time  varying  F ; 
see,  for  example,  Engel  and  Nagel  [48]  for  more  evolution  operators  or  Maybeck  [129]  for  assistance 
in  determining  state  transition  matrices. 
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this  research.  Therefore,  we  may  use  the  ISKF  to  estimate  the  scalar  state  process, 
x,  at  times  T  =  {t0,£i,  •  •  • ,  tfmai}  using  the  dynamics  equation 

x(tj+ 1)  =  ®{ti+ 1  -  U)  x(ti)  +  Bd(ti)  u(ti)  +  wd(fj)  (4.16) 

where  <E>(ii+i  —  U),  Bd(ti),  and  w d(ti)  are  as  previously  described,  for  known  control 
inputs  u(ti),  given  noise-corrupted  sampled-data  vector  measurements  of  the  form 

z(ti)  =  Hx(fj)  +  vfo)  (4.22) 

where  the  measurement  transformation,  H,  creates  a  column  vector  of  M  integrals 
over  the  state  as  previously  given  in  Equation  (4.11),  and  v  is  the  zero-mean  white 
Gaussian  noise  process. 

4-2.7  Essentially -Equivalent  Finite- Dimensional  Discrete-Time  Model.  In 
accordance  with  our  stated  methodology  expressed  by  Figure  4.1  on  page  4-5,  we 
started  with  an  infinite-dimensional  continuous-time  model,  mapped  it  to  the  equiv¬ 
alent  infinite-dimensional,  discrete-time  model,  and  then  created  an  essentially- 
equivalent  finite-dimensional,  discrete-time  model.  We  shall  use  the  ISKF  with  a 
finite-dimensional  approximation  of  the  state  function  to  estimate  the  state  of  the 
system  optimally  on  a  particular  subspace  described  by  an  essentially-equivalent 
finite-dimensional,  discrete-time  model;  in  essence,  we  will  be  using  the  finite¬ 
dimensional  sampled-data  Kalman  filter. 

During  the  derivation  of  the  ISKF  we  made  no  attempt  to  define  the  attributes 
of  the  various  transformations,  operators,  and  functions10  beyond  what  was  needed 
to  use  the  tools  of  functional  analysis  properly.  However,  in  the  example  developed  in 
this  chapter,  we  have  defined  these  transformations  fully.  Generally,  it  is  not  feasible 

^Transformations,  operators,  and  functions  are  all  mappings,  hence,  we  will  often  use  the  term 
transformations  to  refer  to  all  of  them  when  it  is  not  important  to  distinguish  between  the  various 
types  of  mappings. 
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to  implement  an  infinite- dimensional  transformation  using  a  finite- dimensional  com¬ 
puter  algorithm  with  limited  computational  capabilities.  Thus,  our  next  task  is  to 
find  appropriate  matrix  representations  of  the  infinite-dimensional  transformations 
on  a  finite-dimensional  basis11. 

The  following  process  is  carried  out  for  each  of  the  transformations  and  oper¬ 
ators  present  in  the  ISKF.  Let  T  be  a  transformation  acting  on  the  state,  x  G  X 
and  let  Tx  G  Y.  We  shall  approximate  the  state  function  by  projecting  the  state 
onto  a  finite-dimensional  subspace  of  X,  namely:  VH  =  {x  =  Vx  :  \/x  G  X}  G  , 
where  V  is  a  projection  operator  and  WLN  is  an  TV-dimensional  Euclidean  space.  The 
approximate  state  is  written  as  x  =  otT/3 ,  where  at  is  a  vector  of  coefficients  corre¬ 
sponding  to  the  vector  of  basis  elements,  /3.  For  the  approximate  state,  x,  we  have 
Tx  G  VY.  Evaluation  of  Tx  yields  an  expression  containing  a  finite-dimensional  ma¬ 
trix  representation  of  the  transformation,  denoted  by  T.  (Note  that  the  tilde  above 
T  is  used  to  denote  the  finite-dimensional  approximation  of  the  infinite-dimensional 
function  or  transformation.)  Hence,  whenever  the  state  function  x  is  limited  to 
a  finite-dimensional  subspace  of  the  infinite-dimensional  space,  it  reduces  the  di¬ 
mensionality  of  the  infinite-dimensional  transformation  such  that  we  are  left  with  a 
finite-dimensional  matrix  representation! 

Admittedly,  there  are  many  ways  to  approximate  the  state  function.  Since  the 
aim  of  this  research  is  more  in  line  with  “demonstrating  the  concept”  rather  than  “op¬ 
timizing  the  methodology” ,  we  have  chosen  to  approximate  the  state  function  with 
the  first  TV  terms  of  a  Fourier  series  expansion  of  the  function  —  this  will  be  discussed 
in  more  depth  in  the  next  section.  Note  that  the  finite-dimensional  matrix  represen¬ 
tation  of  an  infinite-dimensional  transformation  depends  on  the  finite-dimensional 
approximation  of  the  state  function. 

nThis  process  is  conceptually  analogous  to  reduced  order  filtering;  in  reduced  order  filtering, 
relatively  unimportant  states  of  the  system  “truth”  model  are  removed  or  combined  with  other 
states  in  order  to  reduce  the  complexity  of  the  model  used  to  design  a  filter  such  that  it  becomes 
a  more  cost  effective  implementation  [129,  175,  176,  71] 
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Thus,  instead  of  propagating  and  updating  the  actual  infinite-dimensional 
transformations  and  functions  as  defined  in  the  previous  sections,  we  will  propagate 
and  update  matrices  and  vectors  that  represent  these  infinite-dimensional  transfor¬ 
mations  and  functions  on  a  finite-dimensional  subspace.  In  the  end,  our  implemented 
algorithm  will  be  nearly  identical  to  the  standard  sampled-data  Kalman  filter. 

For  finite- dimensional  approximations  of  the  state,  input,  and  noise  functions, 
the  system  dynamics  described  by  Equation  (4.16)  becomes 


x(t*+i)  =  ®(ti+ 1  -  U)  x(ti)  +  Bd(ti )  u(ti)  +  w d(ti)  (4.23) 

and  Equation  (4.22)  for  a  given  noise-corrupted  sampled-data  vector  of  measure¬ 
ments  takes  the  form 

z  (U)  =  H  x(ti)  +  \/(ti)  (4.24) 

Just  as  we  often  work  calculus  and  algebra  problems  as  far  as  possible  before 
evaluating  them  for  a  particular  numerical  solution,  we  usually  develop  ancillary 
relationships  prior  to  employing  the  characteristics  of  the  finite- dimensional  approx¬ 
imations.  If  we  were  to  make  the  approximations  first,  then  we  would  have  to  repeat 
many  steps  of  the  derivations  each  time  the  approximation  changed.  However,  the 
results  are  the  same,  only  the  workload  changes. 

4-2.8  Approximating  the  Infinite- Dimensional  State  Via  Projection.  It  is 
impractical  to  determine  the  exact  (infinite-dimensional)  temperature  function  x 
along  the  slender  cylindrical  rod  using  a  finite  number  of  sensors.  Therefore,  we 
desire  a  suitable  approximation  for  x.  Since  we  know  that  the  function  x  is  an 
element  in  a  separable  Hilbert  space,  it  can  be  expanded  in  a  Fourier  series  [154] 
without  approximation  for  time  ti  as 

OO  OO 

x(U,p)  =  ^2(x(ti,p),pn(p))pn(p)  =  ^an(ti)  pn(p)  (4.25) 

71=0  77=0 
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with  coefficients  an  defined  as 


<xn(ti)  =  {x(ti,p),/3n(p))=  x(ti,  p) /3n(p)  dp  (4.26) 

Jo 

and  the  orthonormal  set  =  {/30(p),  Pi(p),  . . defined  as 


Pn(P)  = 


I  y/2  cos(mrp), 


n  —  0 
n  >  0 


(4.27) 


forms  an  orthonormal  basis  for  this  Hilbert  space.  Thus,  in  this  formulation  of  x,  we 
may  have  to  compute  a  countably  infinite  number  of  coefficients  at  each  time  in 
order  to  estimate  the  temperature  along  the  rod  accurately.  Each  coefficient,  an,  is 
computed  by  taking  the  inner  product  of  the  state  with  the  corresponding  member 
of  the  basis,  say  fin,  then 

/  oo  \  oo 

(■ x(ti,p),/3n(p ))  =  {y^am{ti)pm{p),pn{p)  \  =  ^2am(ti)  {/3m(p),Pn(p))  (4.28) 

\  m=0  /  m= 0 

The  orthogonality  property  of  an  orthogonal  basis  reduces  the  inner  product  to 


(Pm  i  Pn  ) 


1,  m  =  n 
0,  m  n 


(4.29) 


where  8mn  is  known  as  the  Kronecker  delta.  Hence  we  can  write  Equation  (4.28)  as 

OO 

(x(ti,  p),Pn(p))  =  ^2  Smn(p)  (4.30) 

m= 0 

The  right-hand  side  is  clearly  an(ti )  since  Smn  is  zero  for  every  term  except  m  =  n. 
Therefore,  Equation  (4.26)  is  true  and  Equation  (4.25)  has  been  completely  justified. 
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The  infinite- dimensional  scalar  state  function  x  can  be  well  approximated  by 
projecting  the  infinite-dimensional  function,  x,  onto  a  finite-dimensional  subspace  by 
truncating  the  Fourier  series  expansion  after  N  terms.  This  makes  good  sense  from 
an  engineering  point  of  view  —  the  information  in  the  high  frequency  terms  is  often 
dominated  by  noise;  hence  we  are  effectively  low-pass  filtering  the  state  with  an  ideal 
lowpass  filter.  Thus,  projecting  x  onto  a  finite  N- dimensional  space  is  accomplished 
by 

x(U,p)  =  f Px\(ti,p )  =  V 

where  V  is  the  projection  operator,  the  coefficients  a0,  ay, . . .  are  as  defined  as  above 
in  Equation  (4.26),  and  a  basis  for  this  subspace  is  given  in  Equation  (4.27).  Thus, 

N—l 

x(ti,p )  =  ^  an(ti)  Pn(p)  (4.32) 

n= 0 

and  the  basis  is  denoted  by 


y ^  an(ti)  Pn(p)  (4.31) 

71= 0 


—  {Po(p),  Pi(p),  ■  ■  -Pn-i(p)} 


(4.33) 


A  more  convenient  form  for  Equation  (4.32),  using  vector  multiplication,  is 

x(U,  p )  =  cxT{ti)  (3(p)  (4.34) 

where  a(t)  and  (3(p)  are  defined  as  column  vectors: 


oc(ti)  = 

(j'i) 

oil  (ti)  ■ 

-  1 

••  OlN~l{ti) 

(4.35) 

Pip)  = 

Po  (p) 

Pi  (p)  •• 

-  T 

•  Pn-i(p) 

(4.36) 
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Additionally,  the  coordinate  vector  of  x  with  respect  to  the  basis  Bjv  is  [85] 


x(U,  p)  =  [a(ti) ]Bjv  (4.37) 

Furthermore,  the  state  estimation  error  is  defined  by 

e(f- )  =  x(ti)  -  x(t“ )  (4.38) 

and  the  finite-dimensional  approximation  of  the  state  estimation  error  is  given  by  a 
truncated  Fourier  series  expansion  as: 

N-l 

e(^“,p)  =  Ve(ti,p)  =  J^en(f“)/?n(p)  (4.39) 

n= 0 

where  the  coefficients  are  determined  by 


en(ti  )  =  (e(ti  ),/?„)  (4.40) 

Remark:  In  most  applications,  researchers  have  discretized  the  state  x  in  the 
spatial  domain.  In  contrast,  we  have  discretized  the  state  in  the  spatial- frequency 
domain  and  truncated  the  higher  frequencies  that  are  often  dominated  by  noise.  If 
truncation  performed  by  this  projection  operator  causes  too  much  ringing  or  other 
undesirable  effects  in  the  spatial  domain  representation  of  the  state,  then  perhaps 
additional  filtering  techniques  (or  an  entirely  different  technique)  may  be  needed 
in  order  to  achieve  good  performance.  Finally,  by  choosing  another  subset  of  basis 
elements  or  by  using  a  different  basis  altogether,  we  will,  in  general,  produce  different 
matrix  representations.  Hence,  the  basis  and  projection  operator  employed  can  affect 
the  speed,  cost,  and  effectiveness  of  the  calculation  as  well  as  the  accuracy  of  the 
estimation  process. 
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4-3  Kalman  Filtering  Algorithm 

An  inspection  of  the  sampled-data  Kalman  filtering  algorithm  discussed  in 
Section  2.3.2  reveals  the  following  list  of  transformations 


®(ti,  ti-i),Bd(ti),  Gd (U),  Qd (U),  P [ti  ).  H (ti),  R (U),  P {tl),  A(ti),K(ti 


used  by  the  algorithm  to  perform  its  optimal  estimation  of  the  state;  we  will  often 
refer  to  these  transformations  as  the  components  of  a  Kalman  filter12.  Before  we 
derive  the  matrix  representations  for  the  transformations  using  a  finite-dimensional 
approximation  of  the  state  function,  we  shall  first  find  the  state  transition  operator 
adjoint  and  the  measurement  distributor  transformation  adjoint,  and  perform  some 
preliminary  work  regarding  the  residual  covariance  matrix  and  the  Kalman  filter 
gain  transformation. 


4-3.1  The  State  Transition  Operator  Adjoint  <f>*.  The  state  transition 
operator  maps  both  a  random  state  function  and  its  realizations.  For  ease  of  de¬ 
velopment,  we  shall  begin  by  determining  the  state  transition  adjoint  operator  <f>* 
applied  to  a  realization  of  the  stochastic  (state)  temperature  function.  The  following 
fundamental  equation  relates  how  <f>  G  BCO{%)  is  related  to  its  adjoint13  via  the 
inner  product,  (•,  •),  at  time  ti  [154] 

(<f>(Afi)a;(f;_i),2/(fj_i))  =  {x(ti-i),®*(Ati)y(ti-1))  (4.41) 

for  every  i,i/6X  =  L20  ,,  taken  over  the  Hilbert  space  of  absolutely  square  integrable 
real-valued  functions.  By  definition, 

($(Aij)x(tj_i),y(ij_i))  =  [  [®(Ati)x(ti-i)}(p)y(ti-1,p)dp  (4.42) 

Jo 

12In  this  example,  Gd  (ti)  is  an  identity  operator.  Without  loss  of  generality,  we  could  have  let 
Gd (U)  be  an  identity  operator  in  previous  discussions. 

13Let  X  and  ¥  be  Hilbert  spaces,  then  for  every  pair  of  elements  x  £  H  and  y  £  ¥,  the  linear 
transformation  €  £(X,  Y)  is  related  to  its  adjoint  via  the  inner  product  ('&x,y)Y  =  (a:,  \D '*y)x 
[154]. 
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Substituting  in  for  [<3>(Atj)  x(tj_i)](p)  using  Equation  (4.21)  yields 


El  e  Kn  n  Ati  cos(wrp)  f  p')  cos(nirp')  dp1 


[_n=—oo 

r>l  OO 


y(ti-l,p)dp  (4.43) 


El  e  Kn  n  Atiy(ti-i,  p)  cos(mrp)  /  x(U-i,  p)  cos(mrp')  dp  dp  (4.44) 


This  inner  product  is  finite  because  it  is  defined  on  the  space  of  absolutely  square 
integrablc  functions,  thus,  the  infinite  sum  in  the  integrand  converges.  Therefore, 
by  the  Weierstrauss  M-Test  [7]  we  can  interchange  the  ordering  and  pull  the  infinite 
sum  outside  of  the  integral  to  get 


($(Atj)  x(^_i),r/(ti_i)) 

QO  r.\ 

=  V  e~nn2n2Ati  /  y(ti-i,  p)  cos(wrp)  dp  /  x(U_i 
4o  Jo 


p')  cos(n7rp')  dp' 


(4.45) 


Next,  we  interchange  the  order  of  integrations  by  invoking  the  Fubini-Tonelli  theorem 
[194,  141,  7,  66]  since  y(U-i,  p)  cos(n7rp)  x(ti-i,  p')  cos(wrp')  is  absolutely  integrable. 
Thus,  Equation  (4.45)  becomes 


=  /  X(ti_i,/t/) 

Jo 


E 


li  cos(mrp')  J  y(ti-i,p)  cos(mrp)  dp 


dp'  (4.46) 


Per  Equation  (4.21),  the  term  inside  the  large  square  brackets  in  Equation  (4.46)  is 


E  e  Kn27f2Ati  cos(mrp')  /  y(ti-1,p)co8(rmp)dp=[$(Ati)y(ti-1)](f/)  (4.47) 


Substituting  this  into  Equation  (4.46)  gives 


($(Ati)a:(ti_i),p(ti_i))  =  /  x(ti_i,p')[$(Aii)j/(ti_i)](p')dp' 

Jo 


(4.48) 
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A  comparison  of  the  right-hand  sides  of  Equations  (4.41)  and  (4.48)  yields: 
<h*  =  <h.  When  an  operator  is  equal  to  its  adjoint,  the  operator  is  termed  self- 
adjoint.  For  a  matrix  representation  of  the  operator,  the  conjugate  transpose  of  the 
matrix  yields  the  adjoint.  Hence  the  matrix  representation  of  a  self-adjoint  operator 
is  symmetric  if  the  matrix  is  real,  and  Hcrmitian  symmetric  if  complex. 


4-3.2  The  Measurement  Distributor  Transformation  Adjoint  H*.  In  order 
to  find  the  filter-computed  error  covariance,  A  =  HPH4+R,  we  must  first  determine 
H*,  the  adjoint  of  H.  For  a  given  w  6  0,  x(u;)  =iGX  =  L^0  .  The  transformation 
H  is  defined  for  arbitrary  x  by 


Jo/M  x(ti’  P )  dP 


f(M-l)/M  X^Pii  P)  dP 


D  M 


(4.49) 


for  some  time  tt  G  {ti, . . . ,  tfinai}-  Thus  H  is  a  vector  of  linear  functionals.  For  any 
ti  G  {ti, . . . ,  tfinai},  x  G  L?01,,  y  G  MM,  we  can  use  the  following  definition  of  the 
adjoint,  an  equality  of  inner  products 


(Ha;,y)RM  =  {x,Wy)h2 


(4.50) 


to  determine  the  adjoint  H*.  The  left-hand  side  of  Equation  (4.50)  is  given  by 

y  (*i) 


1‘1/M 

(Ha;,  y)RM  = 

/  x(ti,p)dp  ■■ 

■  /  x(ti,p)dp 

Jo 

Performing  the  inner  product  yields 
(Hr;,; 


rl/M 

f1 

/  x(ti,  p)  dp 

yi(ti)  +  •  •  •  + 

/  x(ti,  p)  dp 

Jo 

_J(M-l)/M 

"1  /M  r  1 

x(th  p)  yiftf)  dp - h  / 


yM(ti )  (4.51) 

x(ti,p)yM(ti)dp  (4.52) 


4-18 


Furthermore,  the  right-hand  side  of  Equation  (4.50)  is 


(^•H*y)L2  =  /  x(U,p)  [H*y](ti,p)dp 
Jo 

M  pn/M 

=  V  /  x(ti,p)[H*y\(ti,p)dp 

m= l  J(m-1)/M 


(4.53) 

(4.54) 


So, 


M  pm/M  M  pm/M 

^2  x(ti,  p)  ym(ti)  dp  =  22  I  x(U,p)[U*y](ti,p)dp  (4.55) 


,i=1  J  (m—l)/M 


and  for  Equation  (4.55)  to  be  true,  it  follows  that  the  summands  must  be  equal  for 
each  m  so 

pm/M  pm/M 

/  x(ti,  p)  ym(ti)  dp  =  x(ti,  p)  \H*y](ti,  p)  dp  (4.56) 

J(m-1)/M  J 

Thus,  the  integrands  over  each  subinterval  are  equal  almost  everywhere  in  p 


x{U,  p)  ym{t%)  =  x(U,p )  [H*y }(ti,p)  (4.57) 

Therefore  [H*y](ij,p)  =  ym{ti )  almost  everywhere  in  p  for  every  rn  —  {1, . . .  ,M}, 
and  any  fixed  U.  So,  for  hxed  U,  the  measurement  distributor  transformation  adjoint 
H*  transforms  a  constant  vector  y (tf)  G  into  a  piece-wise  constant  L?0  r  function 

777  —  1  777 

[H*y ](ti,p)  =  ym(ti),  for  ^  <P<jf  (4-58) 

4-3.3  The  Residual  Covariance  Matrix  A (tj).  The  residual  covariance 
operator  A (tf)  =  H?(^)H*  +  R (t  i ),  was  previously  defined  in  Equation  (3.265). 
We  will  show  that  H  P(t~ )  H*  produces  an  M  x  M  real- valued  matrix;  hence  A  is  an 
M  x  M  real- valued  matrix.  For  an  arbitrary  sample  of  z  at  time  t,l}  the  measurement 
vector  Zj  G  MM 
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(4.59) 

(4.60) 


A(ij)  Zj  =  [H  P(f“ )  H*  +  R(tj)]Zj 
=  H  P(t~ )  H*Zj  +  R(ij)  Zj 

The  second  term  is  just  matrix  multiplication  and  requires  no  further  discussion 
at  this  time.  On  the  other  hand,  the  first  term  involves  two  transformations  and 
an  operator  and  will  require  an  extensive  development.  We  shall  begin  by  explicitly 
calling  out  the  form  of  error  covariance  operator  P{t~ )  applied  to  the  function  H*z j  G 

L[o,i]  by 

P(f-)H*z?:  =  E{[e(t^)  oe(fj_)]|Z(fj_1)  =  Z^jtTz,  (4.61) 

where  E{-}  is  the  conditional  expectation  operator14  and  the  outer  product  operator, 
o,  is  defined  in  Definition  14  in  Chapter  111.  Next,  we  move  H*z.;  into  the  expectation 

P(t~ )  H*z,:  =  E{\e(t~)  o  e((-)]H*z,}  (4.62) 

and  then  use  the  definition  of  the  outer  product  given  in  Equation  (3.6)  to  obtain 

P(i-)  H*z.  =  H'z,)L,}  (4.63) 

where  the  inner  product  is  defined  on  the  L2  space  of  functions  for  interval  [0, 1].  The 
inner  product  on  the  right-hand  side  can  be  equivalently  expressed  using  the  relation¬ 
ship  employed  to  define  the  adjoint  of  the  measurement  distributor  transformation. 
We  get 

<e(fr),H*Zi)L2  =  (He(f-),Zi)RM  (4.64) 

and  since  the  inner  product  of  two  vectors  in  MA/  is 

(He(C~),  z,:)rm  =  [He(f“ )]Tz?:  (4.65) 

14For  ease  of  notation,  we  will  suppress  the  explicit  conditioning  during  the  majority  of  the 
development. 
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thus 

P(t~)  II-Z,  =  E{e(t~)  [He(«-)]TZ,;}  (4.66) 

Next  we  apply  H  to  P(t “)  H*Zj  to  yield 

Io/M  EHt7,p)  [H e(t~ )]Tz i}dp 

H  P{t~ )  FTz,  =  :  (4.67) 

f(M—i)/M  Eie( *7 >  P)  [H  )]TZi}rfp 

where  we  now  include  the  linear  spatial  variable  p  dependence  when  applicable. 
Next,  we  pull  the  expectation  operator  out  of  the  integrals 

E  {/01/M  e(tr>p)  [He(^r)]Tz^p} 

H  P{t~)  H*Zj  =  :  (4.68) 

E  e(^r >  p)  !H  )]Tzt  rfp} 

and  then  out  of  the  array 

J01/Me(Xr>p)  [He(^r)]Tz*rfp 

H  P(t“ )  H*z.)  =  E  \  (4.69) 

f(M—i)/M  e(^r > p)  tH  e(^r  )]Tz*  rfp 

since  a  vector  of  expectations  is  equivalent  to  the  expectation  of  the  vector.  For  the 
mth  element,  expanding  He(t“)  yields 


pm/M 

/  e(Ar ,  P)  [H  e(^“  )]TZj  dp 

nm/M  r 


r-m/M 

/ 

’  (m  —  l)/M 


(U  ,  p)  f01/M  e(ti  ,  p')dp'  ■  ■  •  e(ti  ,  pf)  dp'  z*  dp 

{t-,p)dp  e{t~ ,  p')dp'  •••  f^M_1)/Me(t-,p')dp'  z* 


(4.70) 

(4.71) 
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where  the  second  equality  follows  from  the  fact  that  the  integrations  are  separable. 
For  notational  convenience,  we  define  the  integrated  error  as 

pm/M 

Vm(ti)=  /  e(£“ ,  p)  dp  (4.72) 

so  that  Equation  (4.71)  for  the  mth  element  becomes 


(4.73) 

(4.74) 


Using  Equation  (4.74).  H  P{ti  )  H*  Zj  becomes 

Vi(t7)vi(t7)  •"  Vi(t7)vM(t7)  z* 

HP(t~)  H*z;  =  E  \  (4.75) 

VM(t7)vi(t7)  •"  Vm  (t~ )  rjM  (t~ )  %i 

Vi (t7)vi(t7)  ■■■  vi(t7)vM{t7) 

=  E  :  ••.  :  z,  (4.76) 

Vm  (t7 )  Vi  (t7 )  •  •  •  Vm  [t~ )  gM  [t~ ) 

Applying  the  expectation  to  the  individual  elements  of  the  matrix  yields 

E{vi (f r )  Vi (t7) }  •  •  •  E{vi (t7 )  VM (t~ ) } 

H P(t~ )  H*Z?;  =  :  •.  :  z;  (4.77) 

E{Vm  (t~ )  Vi  (t7)}  ■■■  e(Vm  (t7 )  Vm  (t~ ) } 
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where  the  ninth  element  is 


pm/M  pn/M 

E{Vm(tT)Vn(tT)}  =  E{  e(t~ ,  p)  dp  /  e(t~ ,  p')  dp' 

'  (m—l)/M  J  (n— 1) /M 

pm/M  pn/M 

=  E<  /  e(t~ ,  p)  e(t~ ,  p)  dp  dp 

pm/M  pn/M 

/  /  E  (e(^7 ,  p)  e(t7 ,  p')}  dp  dp 

' (m—l)/M  J  (n—l)/M 


By  the  operator  identity  for  arbitrary  z 


(4.78) 

(4.79) 

(4.80) 


hp^dh* 


EiVi  (ti  )  Vi  (ti  )} 


Eivi(ti  )  VM(ti  )} 

E{VM(t~)vM{t~)} 


(4.81) 


and  therefore  the  filter-computed  error  covariance  operator  is  represented  by  a  real 
M  x  M  matrix 


A  (U) 


EiVi(ti  )Vi )} 

EivM(tT)vi(tT)} 


Eivi(ti  )VM{ti  )} 


+  R  (ti) 


)vM{ti )} 


(4.82) 


In  this  subsection  we  began  with  an  M-vector  of  numbers,  Zj,  and  then  trans¬ 
formed  it  into  a  function  using  H*.  Application  of  the  error  covariance  operator, 
P(t~),  to  H*  Zj  returned  a  modified  function.  Finally,  we  transformed  the  function 
nV)H*  Zj  with  H  to  create  another  M -vector  of  numbers.  The  composite  operator: 
HF(i-)H*  is  thus  a  matrix  of  real  numbers. 
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4-3-4  The  Kalman  Gain  Transformation  K. 

The  Kalman  gain  transfer- 

mation  K  G  CT (Z,  X),  where  Z  =  and  X  =  L?0  ^  for  some  time  fj, 

K(U)  =  P(t;)H*  A -\u) 

(4.83) 

operates  on  the  vector  residual  r  at  time  tt 

r  (U)  =  z i  -  Hx(i“) 

(4.84) 

which  for  this  problem  is  a  vector  of  real  scalars,  i.e. ,  r( 

tf)  G  MM.  Apply  the  gain  to 

the  residual 

[Kr](ti)  =  P(tf)  H*  A-1(fj)  r(ti) 

(4.85) 

=  E{e(tr)oe{tr)}H*  A"1, 

{tf)  r (U) 

(4.86) 

=  E{[e(tr)oe(tr)] H*  A1 

(ti)r(ti)} 

(4.87) 

where  the  expectation  E{e(t~)  0  e(f“)}  is  actually 

a  conditional 

expectation: 

U{[e(f,-  )oe(ti  )]  Z(fj_i)  =  Per  the  development  in  Section  4.3.2,  which  began 

on  page  4-18,  note  that  H*  A_1(U)  r (f  i)  is  an  j,  function.  Using  the  definition  of 

the  function  outer  product  in  Equation  (3.6),  we  get 

[Vr](i.)  =  A-'(t, 

)r(U))h2} 

(4.88) 

(4.89) 

=  -E{e(t-)[He(«-)]TA-1(«,; 

)r(^)} 

(4.90) 

where  the  second  line  follows  from  the  equivalence  of  inner  products  in  the  definition 
of  the  adjoint  and  the  third  line  is  by  definition  of  the  inner  product  for  vectors. 
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Expanding  [He(i;  )]T  into  a  vector  yields 


ir  ,  p')  dp' 

[Kr](ti)  =  E<  e(t~ ,  p)  \  A"1(ii)r(ii) 

f(M—l)/M  e(^i  ’  P)  dP 

r  -|  t 

fo/M  , p)  1  p')  dp' 

=  E  <  :  A_1(tj)r (U) 

J(M-l)/Me(ti’P)e^’P')dp' 

and  then  moving  the  expectation  operator  into  the  row  vector 

T  -|  T 

E  {/oVM  P)  e(^r,  P')  dp'} 

[Kr  }(U)  =  :  A_1(t,:)r(ti)  (4.93) 

E  {/(m-i)/m  e(v ,  P)  e(^“ ,  P')  ^p'} 

1  T 

/T  E  {e(t~ ,  p)  e(f~,  p') }  dp' 

:  A_1(t?:)r(ti)  (4.94) 

f(M—i)/M  E  {e(V ,  P)  e(V ,  p') }  dp' 

where  the  expectation  and  integration  operations  commute  and  we  note  that  the 
integrand  is  related  to  the  error  covariance.  Since  Equation  (4.94)  applies  for  all 
residuals  r(fj),  by  operator  identity,  the  Kalman  gain  transformation  K(ti)  is  a  row 
vector  of  real  numbers  defined  as 

1  T 

fo/M  E  {e(ft“,  P)  e(tr,  pr) }  dp' 

K(U)  =  ;  A_1(fj)  (4.95) 

E  {e(t“ ,  p)  e{t~ ,  p') }  dp' 
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While  the  gain  is  a  finite-dimensional  vector  of  numbers,  we  still  must  address  the 
integrals  since  they  are,  in  general,  infinite-dimensional  operators.  In  Section  4.3.11, 
we  will  show  that,  for  the  basis  that  we  chose  in  Section  4.2.8,  we  can  analytically 
evaluate  the  integrals  and  store  a  vector  of  numbers  to  act  as  the  approximate 
measurement  distributor  matrix. 

4-3.5  The  State  Transition  Matrix  <f>( At j+i).  To  find  the  matrix  represen¬ 
tation  of  our  state  transition  operator,  denoted  as  <f>(AU+i)  because  it  requires  a 
finite-dimensional  approximation  of  that  operator  (and  thus  the  tilde  over  $  versus 
no  tilde  over  4>),  we  evaluate  <h(A ti+i)x(ti),  where  A ti+i  =  ti+ 1  —  tt.  Using  the 
defining  relationship  for  the  state  transition  operator,  Equation  (4.21),  we  obtain 

oo  „i 

[<h(Afj+i)  x(ti)\(p)  =  ^  e-Kn27r2 Ati+i  cos(n7rp)  /  cos(nnp)  x(ti,  p)  dp  (4.96) 

n=—oo 

The  evenness  property  of  the  summand  allows  us  to  double  the  sum  over  the  positive 
indices  and  separately  compute  the  n  —  0  term,  thus: 


[$(Afm)x(U)](p) 

OO  „1 

=  ^  2e~Kn2n2Ati+1  cos(nnp )  /  cos(nirp)  x(U,  p)  dp' + 


x(U,p)  dp 


(4.97) 


Simplifying  the  first  term  yields 


oo  ri 

-.2—2 


E2e 


—  Kn27T2 


cos(n7rp)  /  cos  (77/717/)  x(ti)  p' )  dpf 


n=  1 


N—l 


^2e  cos(nTrp)  /  cos(n7r//)  ^  am(U)  Pm(p')  dp 


n=  1 
oo 


m= 0 


V  am(u)  BM) dp’ 

v  m= 0 


n=  1 
oo 

=  £« 

n=l 


—K,n27T2At. 


V2  Jo 

N-l  ! 

i+i/3n(p)  y;  o:m(u)  /  Pn(p')  pm(p')  dp1 

m= 0  *'° 


(4.98) 

(4.99) 
(4.100) 
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Recall  that 


Pn(p')  Pm(p')  dp  =  ((3n,  (3m)  = 


(4.101) 


Thus  the  first  term  of  Equation  (4.97)  becomes 


E2e 


—  Kn27T2  Atj+l 


cos(mrp)  /  co s(nnp')  x(t p')  dp' 


=  >  e 


—  ren27r2At 


i+1/3n(p)  y]a!m(*i)fln 


oo  AT— 1 

EE' 


n=l  ra=0 


— «m2  7T2  At  i_|_  i 


(4.102) 


(4.103) 


(4.104) 


where  we  evaluated  the  Kronecker  delta  to  obtain  the  last  line.  The  second  term  in 
Equation  (4.97)  is  evaluated  as 


x(ti,p')dp'  =  E  (3m(p  )  dp 


0  m= 0 


^  ^  O^midi)  I  dm ( P  )  dp 

771=0  ^0 

Q?0  (ti) 


(4.105) 


(4.106) 

(4.107) 


since  the  integral  is  zero  for  every  index  m  >  0  and  one  when  m  =  0.  Therefore, 
Equation  (4.97)  becomes 

N- 1 

[$(Afi+i)x(fj)](p)  =  a0(ti) +  ^2e~Km27r2Ati+1am(ti)  (3m(p)  (4.108) 


0  —  K,m2  7T2  Ati_|_  1 


(4.109) 
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where  the  base  a0(ti)  term  rejoined  the  summation  since  both  /30 (p)  and  e°  are  equal 
to  1.  Let  (ftm(Ati+i )  =  Q-Km2-*2 kti+1  anf]  then  we  may  re-express  the  sum  using  matrix 
multiplication  as 


x(ti)](p)  =  aT(ti )  $(Ati+i)/3(p)  (4.110) 


where  ct  and  (3  are  as  previously  defined  in  Equations  (4.35)  and  (4.36),  respectively, 
and  the  N  by  N  diagonal  state  transition  matrix  is  defined  as15 


4>0(AU+i)  0 


*(A ti+1)  4 


0  4>i(Ati+1) 


0 


0 


0  ...  0  (pN-i(Ati+i) 


(4.111) 


$  was  set  with  bold  type  to  emphasize  that  it  is  a  matrix  quantity  and  the  tilde  was 
added  above  to  indicate  that  it  is  a  finite-dimensional  representation  (or  approxima¬ 
tion)  of  the  state  transition  operator  $.  An  equivalent  way  of  expressing  the  state 
transition  matrix  is  by  giving  the  diagonal  entries 


(ti+l 


ti) 


—  e-Kn2TT2(ti+ 1 


u\n  —  0, 1, . . . ,  N  -  1 


(4.112) 


Additionally,  we  note  that  Equation  (4.110)  is  a  weighted  inner  product 


[$(Aii+i)x(ti)](p)  =  («(L),/3(p))^(At.+i}  (4.113) 


15 As  a  point  of  interest,  we  note  that,  per  Theorem  6.4.4  of  Naylor  and  Sell  [154],  there  exists 
a  basis  that  will  yield  a  diagonal  matrix  representation  for  self-adjoint  linear  operators  that  map 
finite-dimensional  Hilbert  spaces  to  themselves... thus  we  can  plainly  see  that  our  choice  of  bases 
used  to  express  our  state  function  has  an  actual  impact  on  our  matrix  representations. 
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whereas  x(ti)  =  [P  x(ti)](p)  =  aT(ti)  (3(p)  is  the  usual  inner  product  in  RN ,  which 
can  be  thought  of  as  an  inner  product  with  an  identity  for  weighting.  Note  that, 
for  zero  propagation  time,  A ti+1  =  0,  <fr(Aii+1)  defined  in  Equation  (4.111)  is  an 
identity  matrix  as  anticipated! 


4-3.6  The  Equivalent  Discrete-Time  Input  Distributor  Matrix  Bd(tj).  We 
have  already  determined  the  first  term  on  the  right-hand  side  of  Equation  (4.23), 
<f>(£j+i  —  f,)x(fj),  now  we  shall  address  the  second  term:  Bd(ti)  u(ti),  where 


Bditi) 


H+ 1 


$(*i+i 


B  ds 


(4.17) 


Before  we  employ  any  approximations,  we  have 


Bd(ti)u(ti) 


s )  B  ds  u(ti) 


(4.114) 


which  can  be  written  as 


Bd(ti)u(ti) 


s )  u(ti)  ds 


(4.115) 


since  B  is  a  constant  and  u(ti),  which  is  piece-wise  constant,  is  not  a  function  of  the 
integration  variable  s.  Let 


OO 

u{thp)  =  vm{U)Pm{p) 

m= 0 


and  then 


N—l 

u(U,p )  =  Vu(ti,p )  =  ^  Vm{ti)Pm{p) 

m= 0 


Hence  by  Equation  (4.109) 


(4.116) 


(4.117) 


N- 1 

$(L+1  -  s)  u(U)  =  J2  e-™2*2^-*')  um(ti)  f3m{p)  (4.118) 

m= 0 
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Therefore,  Equation  (4.115)  for  u(ti)  becomes 


ti+i  n-i 


Bd(ti)u(ti)  =  B  /  e-Km2w2(ti+i-s)  j3m(p)  ds  (4.119) 


J  U  m= 0 
N- 1  ,t 


/■ti+i 

B  E  um(ti)  /  e-Km  *  ds  /3m(p)  (4.120) 


m= 0 


For  the  rn  —  0  case,  the  integrand  is  one,  and  thus  the  integral  yields  [ti+ \  —  t*],  while 
for  m  >  0,  integrating  yields 


H+l 


e-Kin2TT2(ti+1-s)  _ 


~)  —  K,m27T2ti+l 


Kjm2  7T2 


gKm27T2^_|_i  _  ^ K,m27T2t1 


Km2  7T2 


^  _  e-/tm27T2[ti+i-tj]^ 


Thus, 


(4.121) 

(4.122) 


Bd(ti)u(ti)  =  ^o(^)[^+i  ~  ti]Po(p) 

1 


N—l 


+  E  ^M—r^  f1  -  PM  \  (4.123) 

nm- n-  V  / 


ra=l 


Now  let  the  equivalent  discrete-time  input  distributor  matrix ,  Bd(tj)  €  MJVxJV, 
be  defined  as  a  real  diagonal  matrix  with  diagonal  entries  of 


B, 


(U)  = 


B\tiJr\  ti  , 


B 


]_  _  e-Kn2TV2(ti+i-ti) 


nn2Ti2 


n  —  0 

,  n  —  1,2, . . . ,  N  —  1 


(4.124) 


As  such,  Bd(fj)  represents  the  operator  Bd(ti)  in  finite  dimensions.  Therefore,  we 
can  write  Equation  (4.123)  as 


Bd(ti)  u(ti )  =  vT(ti)  Bd(tj)  (3(p) 


B  d{ti)u(ti) 


(4.125) 
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where 


T 


v(t%)  =  vq(U)  vi  (ti)  •••  z/jv-i  (U)  (4.126) 

is  a  vector  of  the  first  N  input  function  coefficients  —  see  Equation  (4.117). 

4-3.7  Propagate  the  Finite- Dimensional  State  Estimate.  The  approx¬ 
imate  state  propagation  conditional  moment  is  E[x(ti+i)\Z(tj)  =  Z *].  Since 
E[x(ti+i)\Z(ti)  =  Z j]  equals  x(t“hl);  therefore  E[x(ti+i)\Z(ti)  =  Z,:]  is  x(t~+1).  Thus, 

£(t“+1)  =  aT(tr+i)  P{p)  =  [«(*m)]Bjv  (4-127) 

using  the  previously  updated  state  estimate  x(t+) 

[Z(t7+i )](P)  =  [$(*i+ 1  -  U)  +  Bd(ti)  u(ti)}(p )  (4.128) 

=  AT(f+)  $(A tm)  /3(p)  +  ^T(f*)  Bd(fj)  /3(p)  (4.129) 

=  <f>(Afm)  a(t+)  +  Bd(ti)  z/(tj)l  (4.130) 

L  Jbjv 


Since  the  basis  vector  (3{p)  is  the  same  for  all  states,  propagating  the  coefficients  for 
the  approximate  state  function  is  the  same  as  propagating  the  state  itself,  that  is 

B{t~+l)  =  $(Ati+i)  a(tf)  +  B d(ti)  u(ti)  (4.131) 

where  the  state  transition  matrix,  <fr(Aij+i),  is  defined  in  Equation  (4.111)  and 
the  equivalent  discrete-time  input  distributor  matrix,  B d(f*),  is  defined  in  Equation 
(4.124). 

4-3.8  The  Equivalent  Discrete-Time  Dynamics  Noise  Covariance  Matrix 
Q d(ti).  Per  Equation  (4.19),  we  know  that  the  equivalent  discrete-time  dynamics 
noise  covariance  operator  is 

Qd(ti)  =  E[wd(U)  o  w d(ti)]  (4.132) 
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Since  the  state  transition  operator  is  self-adjoint,  i.e.,  <f>*  =  <f>,  and  Q  is  a  real 
number,  we  can  rewrite  Equation  (4.19)  as 


Qd(ti)  ~  Q 


rti+i 


${ti+ 1  -  s)  $(ti+ 1  -  s)  ds 


(4.133) 


where  Qd  is  an  infinite-dimensional  bounded  linear  operator  that  acts  on  the  state16. 
From  previous  sections  we  have  acquired  the  tools  needed  to  find  the  matrix  repre¬ 
sentation  for  the  covariance  operator  Q&  when  it  is  applied  to  a  finite-dimensional 
state.  Thus, 


Qdidi) 


ftj+i 


Q  /  $(ti+ 1  -  s)  $(ti+ 1  -  s)  ds  x(U ) 


H+ 1 


Q  -  s)  <h(b:+i  -  s)x(ti)  ds 


(4.134) 

(4.135) 


where  we  have  factored  x(ti)  =  V  x(ti)  into  the  integral.  We  can  simplify  the  inte¬ 
grand  by  noting  that  ($(r)  :  r  >  0}  forms  a  semi-group  of  operators  and  thus 


[$(ti+1- s)$(ti+1- s)x(ti)](p)  =  [$(ti+i  -  s  +  ti+1~  s)x(ti)](p)  (4.136) 

=  a.T(ti)  &(ti+ 1  -  s  +  ti+ 1  -  s)  (3(p)  (4.137) 
=  «T(ti)$(ti+i  -  s)$(ti+i  -s)(3(p)  (4.138) 
=  o^iti)  <&2(ti+1  -  s )  /3(p)  (4.139) 


where  the  first  equality  is  due  to  the  semi-group  property,  the  second  line  is  analogous 
to  Equation  (4.110),  the  third  line  follows  from  the  definition  of  <f>,  and  the  last  line 


16In  the  finite  dimensional  case,  the  equivalent  discrete-time  covariance  was  given  in  the  form 
of  an  ?i-by-n  matrix  Qj.  According  to  matrix  multiplication  rules,  an  n-by-n  matrix  may  only  be 
pre-multiplied  by  a  matrix  (or  a  vector)  with  n  rows  or  post-multiplied  by  a  matrix  (or  a  vector) 
with  n  columns;  this  corresponds  to  the  same  size  as  the  state  vector.  Thus  it  should  come  as 
no  surprise  that  in  our  infinite-dimensional  case,  this  equivalent  discrete-time  covariance  operator 
may,  in  fact,  act  on  an  infinite-dimensional  state. 
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is  by  convention.  Thus,  we  can  simplify  Equation  (4.135)  to 


Qdij-'i)  Q 


1~1  2 

a.T(ti)  ®  (ti+ 1  -  s)  (3(p)  ds 


=  Q 


rU+i 


'TV— 1 


y  ^  o?n(tj)  0n(tj_i_i  s)  /3n(p) 


n=0 


ds 


(4.140) 

(4.141) 


Furthermore,  since  the  sum  is  finite,  we  can  extract  all  non  “s”  terms  out  of  the 
integral  to  get: 


TV-1 

Q  y  '  &n(di) 

n= 0 


0n(i*+l  ~S)  ds  Pn(p) 


TV— 1 

Q  y  '  C^nidi) 

n= 0 


e-2Kn27r2(U+i 


s)  ds  /?n(p) 


(4.142) 

(4.143) 


Note  that,  for  n  =  0,  the  integrand  becomes  one,  whereas  for  the  n^O  case,  an 
exponential  remains.  Integrating  yields 


Qdidi) 


Q 


a0(ti)  [ti+ 1  -  t*]  p0(p) 


TV— 1 

4“  ^  '  Otn(ti) 

n=  1 


e-2Kn27T2ti+l 

2/cn%2 


(4.144) 


and  then  multiplying  out  the  exponential  gives 


Qd(ti)x(ti)  =  Qla0(ti)[ti+1-ti]f30(p) 


TV-1 


n=l 


2K,n2ir 2 


2  _  g  — 2K7l27T2(ii+l— U) 


A*(p)  ^  (4.145) 
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Now  let  the  equivalent  discrete-time  dynamics  noise  covariance  matrix , 
Qd (ti)  G  M,NxN ,  be  defined  as  a  diagonal  matrix  with  entries 


(U) 


n 


Q 

Q 


[^i+1  ^i\  i 

h  _  g— 2/tn27r2 (ti+i—ti) 


2K,n2n2 


n  —  0 

n  =  1,2,...,1V-1 


(4.146) 


As  such,  Qd(tj)  is  the  finite-dimensional  representation  of  the  equivalent  discrete¬ 
time  dynamics  noise  covariance  operator  Qdifi).  Therefore,  we  can  write  Equation 
(4.145)  as 

Qdfa)  x(ti )  =  aT(ti)  Qd (ti)  (3(p)  (4.147) 


4-3.9  First  and  Second  Order  Statistical  Moments  for  the  State  Coefficients. 
The  discrete-time  dynamics  model  for  the  approximate  state  function  given  by  Equa¬ 
tion  (4.23)  and  rewritten  here  with  decremented  time  indices  as 


x(U)  =  $(tj  -  U- 1)  x(ti- 1)  +  Bd(ti- 1)  u(ti- 1)  +  wd(tj_i)  (4.148) 


gives  us  a  model  for  propagating  the  state  on  a  finite-dimensional  subspace  from  one 
time  sample  to  the  next.  Since  we  are,  in  effect,  propagating  a  Gaussian  conditional 
probability  density  function  (PDF),  we  need  to  have  knowledge  of  the  first  two 
statistical  moments.  The  conditional  state  mean,  corresponding  to  the  propagation 
cycle,  is 

cm)  izfe-o  =  Zi-ij 

=  E[®(ti  1)  +  Bd(U-i)  u(U-i)  +  wd(tj_i)|Z(ti_i)  =  Zj_i]  (4.149) 
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The  Kalman  filter  computes  two  different  conditional17  error  covariance  op¬ 
erators,  P{t~ )  and  as  defined  in  Definition  89,  on  page  3-83,  by  Equations 

(3.251)  and  (3.253)  respectively.  Without  loss  in  generality,  we  will  treat  the  con¬ 
ditional  error  covariance  operator  generically  in  the  following  development.  Note 
that  all  error  covariances  are  assumed  to  be  taken  using  a  conditional  expectation 
operator;  however,  the  conditioning  will  normally  be  suppressed  to  ease  notation. 
Next,  we  use  the  definition  of  the  error  covariance  operator  to  obtain 


E{eoe}y 

(4.150) 

E{[eoe]y} 

(4.151) 

y)} 

(4.152) 

where  we  have  used  the  definition  of  the  outer  product  for  functions,  as  defined 
in  Equation  (3.6)  on  page  3-10,  to  express  the  outer  product  in  terms  of  an  inner 
product  as  shown.  Now  if  the  state  estimation  error18  e  is  approximated  by  e  =  eT (3 
and  y  is  approximated  with  y  =  7T/3,  then  expanding  using  summation  notation 
yields 


Py  = 


TV— 1  /TV-1  TV-1  \  ^ 

'y  ^  t-nPn  (  y  ^  ^mPmi  ^  ^  T iPl  J  C 

(4.153) 

n= 0  \m=0  1=0  /  J 

TV-1  TV— 1  TV— 1 

y  '  CnPn  y  ^  y  ^  £mr)'l(Pm,  pi)  > 

(4.154) 

n= 0  m= 0  1=0  J 

TV-1  TV— 1  TV— 1 

y  ^  £nPn  y  ^  y  ^  C 

(4.155) 

n= 0 


m= 0  1=0 


17Recall  that  the  expectation  operator  E  is  conditioned  on  the  measurement  history,  Z(tj_i)  = 
Zj_i,  for  the  propagate  cycle  from  time  i  to  time  tp  while  E  is  conditioned  on  the  measurement 
history,  Z (tp  =  Zj,  for  the  state  measurement  update  at  time  E. 

18For  ease  of  notation,  the  conditional  nature  of  the  expectation  operator,  as  well  as  the  depen¬ 
dencies  on  time  t  and  space  p  for  the  state  and  error  functions  are  suppressed  in  this  development. 
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Then  evaluating  the  Kronecker  delta  gives 


Py 


{TV— 1  N-l 

^  ^  Cn/3n  ^  ^  fm7m 

n= 0  m=0 

{V— 1  Af-1 

^  ^  7 m^m  ^  ^  ^n/3 n 

m= 0 

£{7TeeT/3} 


n=0 


(4.156) 

(4.157) 

(4.158) 


where  the  third  line  employs  the  compact  vector  notation. 

Pulling  the  7  terms  outside  of  the  expectation  operator  gives 


Py  =  7T^{66T}/3 
=  7TP/3 


(4.159) 

(4.160) 


where  the  conditional  error  covariance  operator  P  has  a  finite-dimensional  matrix 
representation  of 

P  =  £{eeT}  (4.161) 

For  completeness,  we  give  the  full  notation  for  the  propagation  cycle 

P(*r)  =  B  {£(*.-)  eT(«r)|Z(«.-i)  =  Z,-i}  (4.162) 

and  the  update  step 

P(f.+)  =  E  {e (t*)  eT(tf  )\Z(ti)  =  Z,}  (4.163) 

4-3.10  The  Propagation  Error  Covariance  Matrix  P(t“|_1).  The  propaga¬ 
tion  error  covariance  operator  equation,  given  by  Equation  (3.271)  of  Theorem  91, 
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at  time  ti+i  holds  for  all  states  x  G  L?0  ,, 

P(t~+1)  x(U)  =  [$(A ti+1)  P(t+)  $*(Ati+1)  +  QiiUMU)  (4.164) 

=  +  (4.165) 

where  the  update  error  covariance  operator  is  P(tf)  =  E{e(tf)  oe(it+)}.  For  a 
finite-dimensional  approximation  of  the  state  function,  we  have 

P{t~+l)x{ti)  =  [®(Ati+1)E{e(tl)oe(tf)}®*(Ati+i)  +  Qd(ti)]x(ti)  (4.166) 
=  $(Ati+i)  E{e(tf)  oe(tf)}$*(Ati+i)  x(U)  +  Qd(U)  x(ti)  (4.167) 

Since  we  already  know  Qd(U)  ^(U)  from  Equation  (4.147),  we  shall  address  the  first 
term  in  Equation  (4.167) 

4>(Ati+i)  E{e(tf)oe(tf)}$*(Ati+1)  x(U ) 

=  $(A ti+1)  E{[e(tt)  o  e(ttW(Ati+1)  x(U)}  (4.168) 

Applying  the  definition  of  the  outer  product  from  Equation  (3.6)  yields 

$ ( Ati+i )  E{e(tf)  o  e (tf ) } <F*  (A ti+1)  x (U ) 

=  *(A ti+1)  E{e(tt)(e(tt),  $*(Ati+1)  x(U))}  (4.169) 

Expanding  e(tl)(e(tf),  <F*(Atj+1)  x(ti))  using  summation  notations  yields 

e(t+)(e(t+),4>*(Atm)x(ti)) 

N- 1  IN- 1  N- 1  \ 

4>i(Ati+i)  ai(ti )  A (p)  )  (4.170) 

n=0  \m= 0  i=0  / 
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and  then  moving  the  inner  product  inside  (since  all  the  other  terms  are  independent 
of  p)  produces 


TV- 1  TV— 1  TV— 1 

=  ^2en(tt)  Pn(p)  ^2  6™(^+)  MAti+l)  <Xl(ti)(J3m(p),  A(P)) 

n=  0  m= 0  T=0 

TV-1  TV-1  TV-1 

m(t+)  0i(Ati+1)  cp(^)  5mi(p) 

n= 0  m=0  T=0 

TV-1  TV-1 

n= 0  T=0 

where  we  recognized  and  then  evaluated  the  Kronecker  delta  8mi  =  (/3m,/3i). 
Reordering  terms  and  then  writing  in  terms  of  vector  notation  yields 


(4.171) 

(4.172) 

(4.173) 


e(X+)(e(^+),$*(  Ati+1)x(ti)) 

TV-1  TV-1 

=  ti+i)<xi(ti)ei(tf)'^2en(tt)pn(p)  (4.174) 

T=0  n= 0 

=  aT(U )  $(Ati+i)  e(tf)  eT(t+)  (3{p)  (4.175) 


Now  we  re-apply  the  transition  operator  <3>(A£j+i)  and  expectation  operator19  E  in 
order  to  obtain  the  first  term  of  Equation  (4.167) 


$(Ati+i)  E  |ttT(t*)  ®{Ati+i)  e(tf )  eT(t 

=  $(Ati+1)  o?(U)  $>(Ati+i)  E  (e(t+)  eT(t+)}  (3(p) 
=  olt(U)  $(Ati+1)  E  (e(t+)  eT(t+)}  $(A ti+1)  (3(p) 
=  cJ(U)  S(Ati+1)  P  (tf)  *(Ati+1)  (3(p) 


(4.176) 

(4.177) 

(4.178) 


where  as  before  PR9")  =  E  {e(tf)  eT(t+)}  and  we  see  that  the  state  transition  op¬ 
erator  <f>(Atj+i)  merely  weights  the  inner  product  aT(R)  P(R+)  <J?(AR+i)  (3(p)  with 

19This  is  still  the  appropriately  conditioned  expectation  operator. 
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the  diagonal  matrix  <fr(At,+i).  Using  Equations  (4.147)  and  (4.178)  we  get 

P{t~+1)  x(U ) 

=  aT(U)  <&(A ti+1)  P(tf)&(Ati+1)  P{p)  +  aT(U)  Qd(U)  P(p)  (4.179) 
=  aT(U)  ^ (Atj+i)  P(t+)  $(Ati+i)  +  Qd(U)  f3(p)  (4.180) 

Since  Equation  (4.180)  holds  for  any  state,  xftf),  the  error  covariance  matrix  for 
propagation  is 

P(t-+1)  =  $(A  ti+1)  P  (tf)  $(A  ti+1)  +  Qd(ti)  (4.181) 

where  the  state  transition  matrix,  <h(Afj+i),  is  defined  in  Equation  (4.111)  and 
the  equivalent  discrete-time  diffusion  matrix,  Qd(U),  is  defined  in  Equation  (4.146). 
Thus,  propagating  the  approximate  finite-dimensional  error  covariance  operator, 
P(^i+i)>  is  achieved  by  a  weighted  inner  product  that  depends  on  a  diagonal  matrix, 
<f>(Afj+i),  related  to  the  state  transition  operator  <h;  a  diagonal  matrix  Qd(U),  re¬ 
lated  to  the  equivalent  discrete-time  diffusion  operator,  Qd]  and  a  symmetric  matrix, 
P  (tf),  clue  to  the  conditional  error  covariance  following  the  measurement  update. 

4.3.11  The  Measurement  Distributor  Matrix  H.  According  to  Theorem 
3.11  of  Hoffman  and  Kunze  [85],  there  exists  a  unique  matrix  representation  for 
a  finite-dimensional  transformation  relative  to  some  particular  ordered  basis.  So 
our  first  task  is  to  create  a  finite-dimensional  transformation.  The  measurement 
distributor  transformation,  H.  as  defined  for  an  arbitrary  realization  of  the  state 
function,  x,  is 

f01/M  x(ti,  p)  dp 

Hx{U)=  ;  (4.182) 

f(M—l)/M  P )  dp 
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Next,  H  applied  to  a  finite-dimensional  approximation  of  the  state,  x(ti),  is 

Jo/M  x(U,  p )  dp 

=|  |  |  (4.183) 

f(M-l)/M  P)  dp 

Using  a  truncated  Fourier  series  expression,  x(U,p )  =  aT(U)  (3(p)  we  get 

Jo/M  «T(fi)  P(p)  dP 


H  x(ti)  = 


aT(U)  0(p)  dp 


(4.184) 


The  mth  element  of  the  vector  can  be  simplified  using  the  integrated  basis  pm.n  for 
m  =  { 1,  •  •  • ,  M}  and  n  —  {0, . . . ,  N  —  1} 


rrn/M 

Pm,n  —  /  /^n(p)  dp 


(4.185) 


as 


pm/M 


'  (m—l)/M 


ctT(ti)  j3(p)  dp  = 


r-m/M 


N—l 


TV- 1 


^  '  &n(ti 


n= 0 
TV— 1 


^  ^  &n(ti 

n= 0 

=  Mm«(U) 


Y  Oiniti)  (3n(p)  dp 

(4.186) 

n= 0 

pm/M 

)  /  Pn(p)dp 

(4.187) 

)  Pm,n 

(4.188) 

(4.189) 

where 


Pm  — 


-|T 


pm,0  Pm,  1  '  '  '  Pm, N—l 


(4.190) 
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Then  we  can  write  Equation  (4.184)  as 


H  x(ti)  = 


. .T  ^  /j.  \ 

t 

Mi  a(ii) 

Mi 

T 

Mm 

ac(ti 


(4.191) 


By  placing  the  integrated  basis  elements  {p-m  :  1  <  m  <  M}  in  a  matrix  we  get  the 
measurement  distributor  matrix  defined  by 


H 


1 

■e 

1-1 1-3 

T 

Mm 

M i,o  A*i,i  '  • '  Pi,n-  i 


pM,  0  pM,l  •  •  ■  PM,N- 1 


oMxN 


(4.192) 


Note  that  H  is  time-invariant  and  can  be  pre-computed  analytically  or  with  a  nu¬ 
merical  technique.  Hence  Equation  (4.191)  can  be  written  as 


Hx(ij)  =  Ha(tj) 

For  the  problem  at  hand,  Equation  (4.185)  can  be  written  as 


(4.193) 


/^m,n 


C-q/M  dp,  n  =  0 

f(Z%/M  ^2  cos(mrp)  dp,  n  =  1,  2, . . . ,  N  -  1 


(4.194) 


and  then  evaluated  as 


pm,n 


M> 


n  —  0 


—  sin(n7rp)  ™  n  —  1, 2, . . . ,  N  —  1 

nir  v  r/  (m— 1)/M’  ’  ’  ’ 


(4.195) 
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Note  that  while  we  may  pre-compute  and  store  this  matrix  since  it  is  indepen¬ 
dent  of  time,  we  could  compute  it  at  every  step  if  desired.  Now  if  the  basis  or  the 
measurement  distributor  transformation  were  a  function  of  time,  then  the  matrix  H 
would  have  to  be  computed  at  each  time  step  since  the  matrix  would  no  longer  be 
time- invariant.  So,  instead  of  propagating/updating  the  entire  estimate  of  the  state 
function  approximation,  the  estimated  coefficients  may  be  propagated  as  ra(f“)  and 
updated  to  cc(tf)  using  an  online  digital  computer. 

4-3.12  The  Residual  Covariance  Matrix  A (ti).  In  Section  4.3.3  we  began 

with  A (ti)  =  H  P[t~ )  H*  +  R (ti)  and  ended  with 

E{Vi(t7)Vi(t7)}  •••  E{ql{tr)r]M{t~)} 

A  {ti)=  :  ••.  :  +R  (ti)  (4.82) 

EivM(tT)vi(tT)}  •"  EivM(tT)vM(tT)} 

where 

pm/M  pn/M 

E{vm(tl)  Vn(t~)}=  /  E  {e(t~ ,  p)  e(t~ ,  p')}  dp'  dp  (4.80) 

J  (m— 1) /M  J  (n—l)/M 

In  this  section,  we  shall  evaluate  E{r}m(t~)  f]n(t^)}  using  the  approximate  error 
e(f)-,  p)  in  order  to  calculate  A  {ti).  Note  that,  for  the  first  time,  we  have  not  reduced 
the  dimension,  but  rather,  we  have  simply  introduced  an  approximation  so  that  we 
could  implement  the  operator  using  a  digital  algorithm. 

Substituting  the  approximate  error  in  Equation  (4.39)  in  for  the  error  in  Equa¬ 
tion  (4.80)  yields 

pm/M  pn/M 

E{Vm(t-)f)n(tr)}  =  /  /  e  {e(t~ ,  p)  e(t~ ,  p')}  dp'  dp  (4.196) 

J  J  (n— 1)/M 
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Now  use  Equation  (4.39)  to  obtain 


E{V m(ti  )fj n(ti  )} 


pm/M 

/ 

l'7l/M 

E\ 

(m—l)/M  J 

(n-l)/M 

1 

pm/M 

/ 

l'7l/M 

e\ 

(m—l)/M  J 

(n— 1)/M 

\ 

N—l 


N- 1 


e  E'iMwE  ek(ti  )  Pk(p')  >  dp  dp  (4.197) 

l  3=0  k= 0  J 

f  N-l  N-l  } 

*  EE  Cj(ti  )  Pj (p)  ek(ti  )  Pk(p')  >  dp'  dp  (4.198) 


Then,  moving  the  expectation  operator  inside  the  double  sum  results  in 

E{Vm(ti)fjn(ti)} 

pm/M  pn/M  N—l  N—l 

=  /  E  { ei  h  (fP  £k  )  Pk {pi) }  dp'  dp  (4. 199) 

J(m-1)/M  J(n-1)/M  J=0  k=0 

pm/M  pn/M  N—l  N—l 

=  /  ^2E{eAti)ek{t7)}Pj(p)Pk{p')dp,dp  (4.200) 

J(m.-l)/M  J  (n—l)/ M  J=0  k=Q 

Since  the  sums  are  finite,  we  can  factor  out  the  terms  independent  of  p  and  p' 


E{Vm{ti  )  Vn{ti  )} 
N-l  N—l 


J2J2Eiei(ti)ek(ti  )}  / 

j=0  k—0 

N-l  N-l  „ 

^E{eNDek{t~)}  / 

: n  l n  J  ( 7 


m/M  pn/M 


(m—l)/M  J  (n—l) /M 


Pj{p)  Pk(p')  dp'  dp  (4.201) 


j= 0  k= 0 

N-l  N-l 

i  )ek{ti  )}  Pm, jPn,k 

j= 0  k= 0 


m/M  pn/M 

Pj (p)  dp  /  Pk(p')  dp  (4.202) 

(m— 1)  /M  J  (n—l)/M 


(4.203) 


where  the  pmj  is  the  integrated  basis  that  we  defined  in  Equation  (4.185).  Using 
matrix  notation,  we  can  rewrite  the  mnth  element  as 


E{V m(ti  )  rjn(ti  )}  =  Nm  EiePi  )  eNi  )}  Pn  =  Pm  )  P 


(4.204) 
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Hence  our  approximation  of  H  P(ti  )  H  is 

Ein i (ti ) vi (*7) }  ■■■  Eivi (tT ) v M (t; ) } 

e{Vm  ( t~ )  fji  (t~ ) }  •  •  •  E{fjM  (t~ )  fiM  (t~ ) } 

M?P(^~)Mi  ■■■  M?P(4“)Mm 

:  :  (4.205) 

MLP(V)Mi  ■■■  MmP(47)Mm 

T 

Mi 

=  !  P(*f)  Mr  •  Mm  (4-206) 

Mm 

=  HP(f-)Hr  (4.207) 


Therefore,  the  approximate  filter- computed  residual  covariance  matrix  at  time  tt  is 
defined  as 

A (U)  =  H  P(tr)  Ht  +  K(U)  e  MMxM  (4.208) 

4-3.13  The  Kalman  Gain  Transformation  Matrix  K.  In  Section  4.3.4  we 
found  that  the  gain  K(ti)  =  P[t~ )  H*  A-1(fj)  can  be  written  as 

r  -|  t 

Io/M  E  {e(*f ,  p )  e<X7>  p') }  dP' 

K(U)  =  ;  A_1(tj)  (4.95) 

E  {^7 >  P)  >  P')  }  V 

In  order  to  evaluate  the  Kalman  gain,  we  shall  use  the  approximate  error  function, 
e(t~,p)  =  Y7f=o  ej(t7)  Pj(p)  and  the  approximate  filter-computed  error  covariance, 
A (tf)  given  in  Equation  (4.208). 
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Collecting  the  mth  element  from  Equation  (4.215)  and  the  results  from  Equation 
(4.208)  for  the  projected  filter-computed  residual  covariance,  we  have  the  represen¬ 
tation  of  the  Kalman  gain  transformation  in  the  subspace  chosen,  also  known  as  the 
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Kalman  gain  matrix : 


K(t.)  =  /3T(p)P(i-)Ml 

■  ■  ■  /3T(p)  P &  )  Um  HP(ti  )  rT  +  R(^)_ 

1  (4.216) 

=  0T(p)P(t~)  Ml 

...  nM  [HP(t-)HT  +  R(t,)]_1 

(4.217) 

=  /3T(p)P(«-)HT  ' 

HP(i-)HT  +  R(t,)]"1 

(4.218) 

Thus  the  Kalman  gain  matrix  with  respect  to  the  basis  Bjv  is20 

K(£j)  =  P (t~)  Ht  |h  P(tr)  Ht  +  R(ti)]  *  G  RNxM  (4.219) 

-  Bjv 

4.3.14  The  Updated  State  Estimate  x(tf) .  The  approximate  updated  state 

estimate  is  x(tf) 

x{tf)  =  x(t~)  +  K(ti)  r(ti)  (4.220) 

=  dT(tr)/3(p)  +  K(ti)r(*i)  (4-221) 

where  the  approximate  measurement  residual  —  approximate  because  the  predicted 
measurement  is  an  approximation  —  is  given  by 

f  (U)  =  Zj  —  H  x(t~ )  =  z,;  -  H  a.(tj )  (4.222) 

Substituting  in  for  the  Kalman  gain  matrix  defined  in  Equation  (4.218)  yields 

~x(t+)  =  aT(«-)/3(p)+/3T(/.)P((-)HT[HP(«-)HT  +  R(f()]_1f(«,)  (4.223) 

=  aT(«r)/3(p)  +  |p(t-)HT[HP(t-)HT  +  R(«i)]"1f(«i)}  0 W  (4.224) 
=  &(V)  +  {p(fr)HT[HP(i.r)HT  +  R(0]_1f(ii)|  0  (p)  (4.225) 

20See  the  coordinate  vector  notation  introduced  in  Equation  (4.37)  on  page  4-15. 
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where  the  second  equality  follows  from  the  fact  that  the  transpose  of  a  scalar  is  a 
scalar.  Additionally,  since  the  basis  Bat  is  the  same  for  all  functions,  we  can  write 
the  approximate  state  update  in  terms  of  the  basis 

Hit)  =  U(«-)  +  P((-)Ht  [HP(«-)HT  +  R((.A1f(«.)]  (4.226) 

J  B;v 

Thus  we  can  update  the  state  by  updating  the  approximate  state  function  coefficients 

(4.227) 

where  the  propagated  state  coefficients,  ac{tf),  are  defined  in  Equation  (4.131),  the 
propagated  error  covariance  matrix,  P(t“),  is  defined  in  Equation  (4.162)  and  com¬ 
puted  in  Equation  (4.181),  the  measurement  distributor  matrix.  H.  is  described  in 
Equation  (4.192),  and  the  measurement  noise  covariance,  R (tf),  is  assumed  known. 

4.3.15  The  Updated  Error  Covariance  Operator  P{tf).  To  generate  a 
matrix  representation  for  the  updated  error  covariance  operator  we  begin  by  recasting 
Equation  (3.269)  for  this  problem  as 

P(V)  =  P(V)  -  K(U)  H  P(t-)  (4.228) 

for  an  arbitrary  state  x  at  time  t,l 

P(t*)x(u)  =  [P(0  -K(u)  HP(«r)MO  (4.229) 

=  ,)  -  K(U)  H  P(t~ )  x(U)  (4.230) 

By  definition, 

P{t~)x(ti)  =  E{[e(t-)«e(t,-)]|Z(ti_1)  =  Z^Jx^)  (4.231) 
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and  (dropping  the  explicit  conditioning  on  the  measurement  history)  this  can  be 
written  as 

P(t~)x(U)  =  E{[e{t-)oe(t-)}x(U)}  (4.232) 

=  E{e(tr)(e(t-),x(ti))l L2}  (4.233) 

where  the  second  line  follows  from  the  definition  of  the  outer  product  given  in  Equa¬ 
tion  (3.6)  on  page  3-10.  Substituting  this  result  into  Equation  (4.230)  and  then 
expanding  the  measurement  distributor  transformation  in  the  following  line  yields 

/>(«.+)i(fi)  =  B{e(«r)(e(tr)^(*i))L>}-V(t.)HB{e(t-)(e(«-),i((j))L,}  (4.234) 

Explicitly  writing  out  the  spatial  dependence  of  the  state  and  the  error  yields 

P{tj)x{ti)  =  E{e(t; ,  p)(e(t; ,  p),x(U,  p))L2} 

Io/M  E{e{tr,p)  (e<Xr>  P),x(ti,  p))h2}d,p 

- K(U )  i  (4.235) 

f(M—l)/M  i  p)  (e(^)_  j  P)  i  x(tii  P))L2  }dp 

The  Erst  term  in  Equation  (4.235)  and  the  repeated  integrand  are  the  same;  evalu¬ 
ating  yields 

E{e(ti,p){e(t~,p),x(ti,p))] l2}  =  E  |e(f“ ,  p)  e(t~,  p')  x{U,  p')  dp'\  (4.236) 

=  E{fo  (4-237) 

i 

E  {e(t~ ,  p)  e(t~ ,  p)  x(U,  p) }  dp  (4.238) 
E  {e{ti,  p)  e(t~ ,  p')}  x(U,  p')  dp'  (4.239) 
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Substituting  the  results  of  Equation  (4.239)  back  into  Equation  (4.235)  gives 


P(f+)  x(U)  =  Pfo)  x(U)  -  K(U )  H  P{t~)  x(ti) 


(4.240) 


E{e(U  ,  P')}  x(U,  p')  dp' 

fo/M  Jo  E  {e(«r > P)  p')}  x(U,  pt)  dp'  dp 


K(U ) 


/(M-D/M  fo  E  {e(*i  ’P)e(^  ,p/)}^(^,P/)rfp,^P 


(4.241) 


where  the  Kalman  gain  is  as  defined  in  Equation  (4.95). 

Now  that  we  have  written  out  P(tf)  x(ti)  f  we  can  substitute  in  the  finite¬ 
dimensional  approximations  to  determine  the  matrix  representation  for  P(tf).  As 
before,  we  replace  the  infinite- dimensional  error  and  state  functions  with  functions 
via  the  truncated  Fourier  series  as  and  the  Kalman  gain  approximation  was  given 
by  Equation  (4.219),  thus 


P(tj)~x{u)  =  P(t-)x(U)-K(U)nP(t-)x(U) 

=  I  E  {e(ti,p)e(ti,p')}x(ti,p')dp' 


(4.242) 


K  it. 


fo/M  fo  E  >  P)  e(*i  >  P')}x(ti,  p')  dp'  dp 


I(m-i)/m  fo1  E  {~ef  i  >  P)  e(<i  ,  P') }  HU,  p')  dp'  dp 


(4.243) 


Evaluating  the  first  term,  P(ti  )x{ti)1  we  get 

[  E{Kt7,  P)  e(Xr,  p')}  x (ti,  p')  dp' 

Jo 

1  f  N-l  N- 1  ^  N- 1 

E  \  )  ^(P)  6k^i  )  /W)  f  S  “*(**)  A(p')  dp'  (4.244) 

l j= 0  fc=0  J  Z=0 
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Extending  the  expectation  operator  to  include  the  deterministic  quantities  yields 


/  E  {e{ti  ,  p)  e(ti  ,  p')}  x(U,  p')  dp' 

Jo 

/I  fN- 1  N- 1  N- 1  ^ 

£  S  XI  )  PAp)  X  )  Pk(p')  X  “*&)  A(p')  >  dp  (4.245) 

l  j=0  fc=0  1=0  ) 

/I  f  N—l  N—l  N—l  \ 

E  <  X  X  X  eJ'(*r )  A(p)  efc(ir )  Afc(p')  A(p')  >  dp  (4.246) 

l  j=0  fc=0  z=o  J 

and  then  we  move  the  expectation  operator  inside  the  summations 


E{KU  ,p)e(ii  ,  PO}  P7)  ^P' 

/!  JV-1  JV-1  JV-1 

X  X  E  {tAti)  Pj(p)  ek(t7)  Pk(p')  ai(ti)  Pi(p')}  dp'  (4.247) 

j=0  k= 0  «=0 

/!  AT— 1  N—l  N—l 

EEE  E  {eAti  )  efc(^  )}  A  (p)  AAA)  A(A)  V  (4.248) 

j=0  fc=0  1=0 

N-1N-1N-1  f.i 

=  EEE  E{eAti)eAti)}PAp)ai(ti)  Pk(p')  Pi(p')  dp'  (4.249) 

j=0  fc=0  «=0  *'° 


where  in  the  last  line  we  move  all  the  terms  independent  of  p'  out  in  front  of  the 
integral.  Note  that  the  integral  fQl  (Skip')  fiiip’)  dp'  is  simply  the  Kronecker  delta, 
thus 


l 

E  {e(t“,  p)  e(ti,  p')}  x (U,  p')  dp' 


N—l  N—l  N—l 

XXX" XOAAfoA-)}  PAp) ai{E) Ski 

(4.250) 

j= 0  k= 0  1=0 

N—l  N—l 

X  X  01  Au)  E  {ej(tr )  ek{t~)}  (3 j{p) 

(4.251) 

j= 0  k= 0 

aT(U)  E  {e(t~ )  eT(t“ ) }  (3(p) 

(4.252) 

aT(ti)P(t;)  P(p) 

(4.253) 
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and  therefore  the  mth  element  of  the  vector  on  the  right-hand  side  of  Equation 
(4.241)  is 


a 


T(ti)E{e(ti  )eT(ti  j}P{p)  dp 


(m—l)/M 


(4.254) 


nm/M  N—l  N—  1 

/  . EE  ak(ti)  E  {ej(ti  )  €k(tt  dp 

j=Q  k=Q 

N—l  N—l  pm/M 

=  EE  ak(ti)E{ej(ti)ek(ti)}  I  Pj(p)  dp  (4.255) 

j=0  k= 0 
N-l  N- 1 

=  EE  ak{ti)  E  {ej{ti  )ek(ti  )}  pmj 

j=0  k= 0 


(4.256) 


where  the  integrated  basis,  ftj(p)  dp,  was  defined  in  Equation  (4.185). 

Rearranging  gives 


pm/M 

/  «T(^)  E  {eN )  eTk“)}  P{p)  dp 

N-l  N-l 

=  EE*».  jE{ej{ti  )ek(ti  )}ak{ti) 

j=0  k= 0 

=  {e(t~)eTN)}  a(U) 


(4.257) 

(4.258) 

(4.259) 


Thus, 


H  P(t~)x(U)  = 


Pi  P (b.  )oc(ti) 


T 

pi 


T 

Pm 


p(^  )  ot(U) 


=  HP  (tr)a(ti) 


(4.260) 


(4.261) 


(4.262) 
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It  then  follows  that  the  first  term  of  Equation  (4.241)  becomes 

P{tt)x{ti) 

=  aT(ti)  P(t~ )  (3(p) 

— /3T(p)  P(t“)  Ht  [HP(tr)  Ht  +  R(f*)]  HP (t~)  a(U)  (4.263) 

=  aT(ti)P{t-)  f3(p) 

-«T(f,)  P(tr)  Ht  [h P (t~ )  Hr  +  R(tO] '  HP(t-)  P(p)  (4.264) 

=  aT(0  |p(«r)  -  P(*D  HT  [fiP(«D  Ht  +  R((.)] HP(«-)}  m  (4.265) 

Since  Equation  (4.265)  applies  to  every  state,  by  operator  equality,  the  error  covari¬ 
ance  matrix  following  the  measurement  update  is 

(4.266) 

where  the  propagated  error  covariance  matrix,  P(t“),  is  defined  in  Equation  (4.162) 
and  computed  in  Equation  (4.181),  the  measurement  distributor  matrix,  H.  is  de¬ 
scribed  in  Equation  (4.192),  and  the  measurement  noise  covariance,  R (£$),  is  assumed 
known. 

4-3.16  Summary.  The  previous  section  began  by  describing  the  deter¬ 
ministic  heat  equation.  Then  we  proceeded  to  develop  the  infinite-dimensional, 
continuous-time  dynamics  model  of  the  heat  equation  and  a  sampled-data  measure¬ 
ment  model,  which  led  to  the  creation  of  the  equivalent  infinite- dimensional,  discrete¬ 
time  dynamics  model.  At  the  end  of  the  section  we  described  how  we  could  derive 
an  essentially-equivalent  finite-dimensional  discrete-time  model  from  the  equivalent 
infinite- dimensional,  discrete-time  model.  In  this  section  we  have  systematically  de¬ 
termined  the  finite-dimensional  matrix  approximations  and  representations  of  the 
multitude  of  components  that  comprise  the  ISKF  in  such  a  fashion  that  we  have 
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synthesized  a  sampled-data  Kalman  filter  to  estimate  optimally  a  finite  number  of 
the  coefficients  associated  with  a  Fourier  series  expansion  of  the  true  state  tempera¬ 
ture  function  of  the  slender  cylindrical  rod.  In  preparation  for  simulating  an  MMAE 
to  estimate  the  temperature  of  the  rod  adaptively,  we  summarize  the  important 
equations  for  this  sampled-data  Kalman  filter. 

The  state  coefficients  are  propagated  by 

oc(i t~+1)  =  a(tf)  +  B d(ti)  u(ti)  (4.131) 


with  error  covariance 


P (tT+i)  =  *(Ati+i)  P(A+)  $(A ti+1)  +  Qd(ti)  (4.181) 


where  the  state  transition  matrix,  <fr(Aij+i),  for  Ati+i  =  f*+i  —  ti,  is  A  by  A  with 
diagonal  elements 


(tj+i 


U) 


=  e-«^2ffi+ 1-*0  n  =  0, 1, . . . ,  N  -  1 


(4.112) 


the  equivalent  discrete-time  input  distributor  is  an  N  by  N  diagonal  matrix  with 
elements 


/ 

1  ti\ , 

n  —  0 

Bd 

(ti)  =  < 

n 

! 

B 

_  e-Kn2n2(ti+1-ti) 

(4.124) 

< 

nn27T2 

-,  n  —  1,2,...,  1 

and  the  equivalent  discrete-time  system  dynamics  noise  covariance  is  an  N  by  N 
diagonal  matrix  with  elements 


(U) 


n 


Q 

Q 


[f'i+l  ^i]  j 

h  _  e-2K,n2TT2(ti+1-ti) 


2K.n27T2 


n  —  0 

n  =  1,  2, . . . ,  IV  —  1 


(4.146) 
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Since  and  Bd  are  diagonal  matrices,  the  approximate  state  function  coefficient 
estimates,  do, . . . ,  oin-i,  are  independently  propagated;  using  Equation  (4.131)  we 
can  write  the  propagation  of  the  nth  coefficient  as 


an(ti  )  =  $  (ti  -  ti- 1)  J  +  Bd  (ti)  un(ti)  (4.267) 


The  measurement  update  of  the  approximate  state  function  coefficient  estimates  is 
performed  by 


HP(i:)H' +  R(1,)  r(t 


-1 


(4.227) 


where  the  approximate  residual  is  given  by 


f  (U)  =  z i  -  H  a(tj  ) 


(4.222) 


with  error  covariance  given  as 


i  -i 


HP((r)H'  +R(ti)  HP(t-)  (4.266) 


where 


H 


hi,o  Ah,i 


hl,V-l 


hM,  0  fJ>M,  1  •  •  '  ^M,N- 1 


[MxJV 


and  the  elements  of  H  are 


(4.192) 


!^m,n 


Mi 


n  —  0 


—  sin(n7rp)  !"  w  ,  n  =  1, 2, . . . ,  N  —  1 

nir  v  r  /  ( ’  ’  ’  ’ 


(4.195) 
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4-4  Multiple  Model  Adaptive  Estimation 


Now  that  we  have  determined  the  finite-dimensional  matrix  representations 
needed  to  implement  the  ISKF  using  a  digital  computer  algorithm,  next  we  shall 
modify  the  MMAE  framework,  that  was  reviewed  in  Chapter  11  and  extended  in 
Chapter  III,  in  order  to  implement  the  MMAE  in  a  simulation  of  this  example  in 
Chapter  V. 

The  formation  of  the  elemental  filters  was  discussed  in  Section  2. 3. 3. 3.  Recall 
that  each  elemental  filter  in  the  bank  (shown  in  Figure  2.1  on  page  2-8)  is  based  upon 
a  different  hypothesis  for  a  parameter  value  used  to  model  the  real  world  system, 
i.e.,  the  kth  elemental  filter  is  constructed  assuming  that  a (£*)  =  a*,,  where  is 
a  member  of  discrete  set  A  which  is  a  subset  of  MJ,  the  J-dimensional  real  vector 
space;  J  specifies  the  number  of  parameters  that  are  adaptively  estimated. 

In  this  research,  we  assume  that  the  prior  probability  is  Pk{to)  =  1  /K  for 
k  =  1, . . . ,  K  and  that  the  hypothesis  conditional  probability  for  all  K  hypotheses 
[125,  107,  11,  108,  130,  132]  is  computed  recursively  by 

/  x  /z(tp|a(t;),Z(ti_i)  (zi  lafc,  Zj_i )  Pk{ti—i) 

Pk{ti)  =  - x - - 

X/j=l  /z(tj)|a(ii),Z(ti_i)(zi|ajp  Zj_i)  Pj(ti_\) 

where  the  conditional  PDF, 

/z(tj)|a(tj),Z(tj_i)(zi|afc)  Zj_i)  =  3k{ti)  ex p  |  2 


(4.268) 


(4.269) 


is  a  zero-mean  Gaussian  with  covariance  A k(ti).  The  PDF  scale  factor  is  given  by 


Pk(U) 


1 

(27r)M/2|Afc(tj)|1/2 


(4.270) 


with  measurement  dimension  M,  and  the  likelihood  quotient,  which  is  a  measure  of 
the  “correctness”  of  the  parameter  values  for  this  particular  model  [130],  being 


Lk(U)  =  r l(ti)  Afc1(fj)  ? k(ti) 


(4.271) 
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where  r *.(£j)  and  A k(U)  are  the  approximate  measurement  residual  and  approximate 
measurement  residual  covariance  calculated  by  the  fcth  elemental  filter.  Note  that 
the  primary  purpose  of  /3k(ti)  is  to  scale  the  PDF  so  that  it  integrates  to  one  and 
that  the  important  information  to  be  retrieved  from  this  PDF  is  contained  in  Lfc(tj). 

The  blended  state  function  coefficient  estimate  is 

K 

Ammae(^)  =  Pk(ti)  (4.272) 

fc= 1 

where  ak(tf)  is  the  state  function  coefficient  estimate  generated  by  the  fcth  elemental 
filter  based  on  the  assumption  that  the  parameter  vector  a (t*)  =  a*,.  The  conditional 
covariance  of  the  state  function  coefficients  is 

P  mmae(^  ) 

I< 

=  ^  |Pfe(y)  +  [Afc(t)1’)  —  ^MMAE^*")]  [<**(£+)  —  «MMAE(t,+  )]T|  Pfc(tj)  (4.273) 

fe= 1 

where  P^f)1-)  is  the  state  coefficient  error  covariance  computed  by  the  fcth  elemental 
filter.  Since  we  are  really  interested  in  the  temperature  state  function  (versus  a 
vector  of  coefficients),  use  Equation  (4.34)  to  recreate  the  state: 

x(U,  p)  =  atT(ti)  (3(p)  (4.274) 


The  parameter  estimate  is  given  by 

K 

%mae {tf )  =  ^  Pk(U)  (4.275) 

k= i 

with  conditional  covariance  of  a (t*)  [129]: 

K 

Pa,MMAE (tf)  =  —  aMMAE(y)][afc  —  aMMAE(y)]T  Pk(ti)  (4.276) 

fc= i 
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4-5  Summary 

The  purpose  of  this  chapter  was  to  demonstrate,  using  a  practical  straight¬ 
forward  example,  the  filtering  theory  developed  in  Chapter  III.  The  temperature 
along  a  slender  cylindrical  rod  was  observed  using  a  sampled-data  measurement 
model  and  an  essentially-equivalent  finite-dimensional  discrete-time  model  (derived 
from  an  infinite-dimensional,  continuous-time  model  for  the  flow  of  heat  —  a  scalar 
stochastic  heat  equation)  was  employed  to  characterize  the  system  dynamics  for 
this  practical  problem.  In  the  bulk  of  this  chapter,  we  systematically  found  the 
finite-dimensional  matrix  approximations  and  representations  for  the  multitude  of 
components  that  comprise  the  ISKF  in  such  a  fashion  that  we  have  synthesized  a 
sampled-data  Kalman  filter  to  estimate  optimally  a  finite  number  of  the  coefficients 
associated  with  a  Fourier  series  expansion  of  the  true  state  temperature  function  of 
the  slender  cylindrical  rod.  Finally,  we  crafted  a  fixed-bank  MMAE  composed  of 
these  finite- dimensional  filters  to  estimate  the  temperature  along  the  rod.  In  the 
sequel  to  this  chapter,  we  shall  use  a  Monte  Carlo  simulation  to  investigate  this 
example  further. 
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V.  Simulation  and  Results 


5. 1  Introduction 

This  chapter  discusses  the  results  of  six  computer-based  Monte  Carlo  simula¬ 
tions1  in  detail.  The  first  five  simulations  feature  state  and/or  parameter  estimation 
in  the  presence  of  an  uncertain  noise  environment.  Specifically,  the  first  two  simula¬ 
tions  involve  estimating  the  covariance  of  the  dynamics  driving  noise;  this  can  be  a 
rather  difficult  task  if  the  measurements  are  of  relatively  poor  quality  as  compared  to 
the  quality  of  the  dynamics  model.  The  quality  (or  precision)  of  the  dynamics  model 
is  expressed  in  terms  of  the  noise  covariance,  Qq,  while  the  quality  of  the  observa¬ 
tion  model  is  related  by  the  measurement-corruption  noise  covariance,  R.  In  the  first 
two  simulations,  we  indirectly  estimate  the  dynamics  noise  covariance  using  noise- 
corrupted  measurements.  In  the  next  three  simulations  the  measurement-corruption 
noise  covariance  is  found.  The  fourth  simulation  demonstrates  the  MMAE’s  capa¬ 
bility  to  adjust  to  a  linearly  changing  measurement-corruption  noise  covariance  - 
both  increasing  and  decreasing  over  time.  In  the  fifth  simulation,  the  elemental 
filters  in  the  MMAE  demonstrate  their  ability  to  change  their  status  from  poorly 
modeling  the  real-world  environment  to  correctly  modeling  it  in  just  a  few  short 
propagate/update  cycles  as  the  truth  model  measurement  noise  covariance  abruptly 
changes  (twice)  during  the  course  of  the  simulation.  The  capability  to  identify  an 
unknown  system  parameter,  often  called  system  identification  [9] ,  is  addressed  in  the 
final  simulation  discussed  in  this  chapter.  The  MMAE  demonstrates  its  ability  to 
identify  a  system  parameter  (in  this  case,  a  material  property  of  the  slender  cylin- 

1Maybeck’s  [134]  MATLAB-based  MMAE  software  was  modified  for  use  in  this  dissertation  to 
create  an  approximate  infinite-dimensional  MMAE  (AIMMAE).  A  technical  report  containing  all 
of  the  code  necessary  to  duplicate  the  results  reported  herein  is  available  upon  request  from  either 
Dr.  Peter  Maybeck  at  Peter.Maybeck@AFIT.edu  or  Dr.  Scott  Sallberg  at  Sallberg@IEEE.org. 
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drical  rod),  and  from  that  we  can  ascertain  the  most  likely  rod  material  from  a  short 
list  of  fairly  distinct  materials. 

All  six  simulations  decomposed  the  temperature  function  into  (at  least)  thirty 
“states”  for  both  the  truth  model  and  elemental  filter  models.  Recall  that  these 
states  are  actually  the  coefficients,  {a0(tj),  aq(£j),  •••,  «29(^i)},  that  correspond  to 
the  basis  elements,  {1,  v/2cos(7t p), . . . ,  cos(297rp)},  used  to  decompose  the  (pos¬ 
sibly)  infinite-dimensional  state  function;  see  the  discussion  in  Section  4.2.8  on  page 
4-12.  With  thirty  basis  elements,  the  mean-squared  error  (MSE)  between  the  ac¬ 
tual  temperature  along  the  rod  (for  a  ramp)  and  the  Gibbs  effect  resulting  from 
the  finite  number  of  basis  elements  used  to  represent  the  signal  are  both  small 
relative  to  the  other  choices  investigated;  ten,  fifteen,  twenty,  thirty,  forty,  fifty, 
sixty,  seventy,  eighty,  and  one  hundred  basis  elements  were  initially  used  in  this 
preliminary  study.  Along  another  line  of  reasoning  employing  the  state  transi¬ 
tion  matrix,  given  in  Equation  (4.112)  on  page  4-28,  we  may  reasonably  expect 
that  only  a  small  number  of  state  coefficients  are  needed  to  model  the  state  func¬ 
tion  adequately.  For  instance,  given  that  the  diagonal  state  transition  matrix 

has  elements:  (U+i  —  t«)  =  e-Kn27r2<’tl+i-c1)  anc[  choosing  k  =  1  m2/sec  and 

J  n 

ti+i—ti  =  0.01  sec,  we  get:  $  (U+i—  U)  =  5.2xl0~5,  $  (ti+ l—tA  =  2.3xlO~10, 

L  J 10  L  J 15 

#  (ti+i  —  ti )  =  7.2  x  10-18,  and  (U+i  —  U)  =  2.6  x  1CT39.  Thus,  N  =  15 
L  J  20  L  J  30 

might  be  all  that  we  need,  since  coefficients  past  the  fifteenth  are  reduced  by  a  fac¬ 
tor  of  a  billion  or  more  during  each  propagation  cycle.  Depending  on  the  computer 
resources  available,  computational  loading  may  also  limit  the  number  of  states  (co¬ 
efficients)  employed  in  the  model.  Finally,  since  the  real  truth  model  would  be  an 
infinite- dimensional  model,  we  have,  in  a  sense,  a  reduced-order  model. 

All  of  the  simulations  feature  a  slender  cylindrical  rod  partitioned  into  five  seg¬ 
ments;  a  separate  measurement  is  taken  over  each  segment.  The  five  measurements 
are  recorded  together  at  each  time  instant  in  the  measurement  vector.  Preliminary 
studies  indicated  that  five  segments  was  a  nice  balance  between  too  few  and  too 
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Parameter 

Symbol 

Value 

Thermal  diffusivity  constant 

K 

1  m2/sec 

Number  of  measurements  (sensors) 

M 

5 

Number  of  states  (basis  elements) 

N 

30 

System  dynamics  noise  strength 

Q 

5  (°C)2/sec 

Measurement-corruption  noise  covariance 

R 

5  (°C)2 

Initial  state  (temperature  along  rod) 

x0 

20  °C 

Initial  state  covariance 

Po 

25  (°C)2 

Table  5.1  Truth  Parameters 


many  segments.  Fewer  segments  would  have  resulted  in  a  coarser  spatial  discretiza¬ 
tion,  which  would  have  made  it  more  difficult  to  detect  the  onset  of  excitation  (heat) 
in  simulation  six  and  thus  degrade  system  identification  capabilities.  However,  we 
did  not  want  to  describe  the  temperature  perfectly  at  every  point  along  the  rod  with 
too  many  segments  either. 

Fifty  Monte  Carlo  runs  were  used  in  each  simulation  to  generate  the  hundreds 
of  plots  seen  on  the  dozens  of  figures  on  the  following  pages.  Once  again,  we  note 
that  all  of  these  plots  were  created  from  a  set  of  fifty  Monte  Carlo  runs  that  utilized  a 
clock-based  seeding  of  the  random  number  generator.  Several  of  the  pertinent  truth 
model  parameters  are  stated  in  Table  5.1. 

Notes  pertaining  to  Table  5.1: 

1.  We  have  chosen  to  work  directly  with  the  dynamics  noise  strength,  Q,  since 
it  is  part  of  the  original  continuous-time  model  description,  see  Section  4.2.1, 
versus  the  equivalent  discrete-time  diffusion,  Qd,  which  is  calculated  by  the 
software  as  needed. 

2.  Maybeck  [129]  discusses  at  length  the  tuning  of  elemental  filters  by  adjusting 
the  Q/R  ratio.  When  these  noise  parameters  are  matrices,  then  the  talk  shifts 
to  a  ratio  of  their  largest  eigenvalues  —  the  same  trends  still  apply  although  it 
is  somewhat  more  complicated  if  the  matrices  are  not  diagonal  matrices.  With 
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that  in  mind,  we  note  that  the  truth  measurement-corruption  noise  covariance 
is  a  diagonal  matrix  R  =  51,  where  I  is  an  M-by-M  identity  matrix.  The 
eigenvalues  of  R  are  all  equal  to  five.  Hence,  when  we  say  R ,  we  are  referring 
to  the  largest  (and  only)  eigenvalue  of  R;  however,  we  will  call  both  of  them  the 
measurement- corruption  noise  covariance  and  the  context  will  dictate  whether 
we  are  referring  to  the  matrix  or  the  largest  eigenvalue. 

3.  The  eigenvalues  of  the  initial  state  covariance  Pq  are  all  equal  to  twenty-five. 
Hence,  when  we  say  P0,  we  are  referring  to  the  largest  (and  only)  eigenvalue 
of  P0;  however,  we  will  call  both  of  them  the  initial  state  covariance  and 
the  context  will  dictate  whether  we  are  referring  to  the  matrix  or  the  largest 
eigenvalue. 

4.  In  the  fourth  simulation,  the  truth  R  is  varied  linearly,  both  increasing  and 
decreasing. 

5.  In  the  fifth  simulation,  we  change  the  truth  measurement-corruption  noise 
covariance  twice  in  an  abrupt  fashion. 

6.  For  the  first  five  simulations,  the  thermal  diffusivity  constant  is  set  to  k  —  1, 
which  for  comparison  purposes  places  it  halfway  between  aluminum  (k  =  0.86) 
and  copper  (k  =  1.14).  In  the  sixth  simulation,  we  set  the  thermal  diffusivity 
constant  to  k  =  0.86  to  perform  system  identification. 

For  each  of  the  simulations,  a  short  description  sets  up  the  goals  and  explains 
a  few  pertinent  facts  about  the  truth  model,  some  of  which  are  tabulated  if  they 
differ  from  Table  5.1.  Additionally,  we  include  a  graphical  description  of  the  pa¬ 
rameter  set  to  help  assess  how  the  elemental  filters  in  the  filter  bank  differ;  see,  for 
example,  Figure  5.12  on  page  5-29.  The  simulation  results  feature  a  large  collection 
of  figures.  For  the  last  five  simulations,  2  through  6,  a  figure  for  initial  results  for 
each  simulation,  such  as  Figure  5.13  on  page  5-30  for  an  investigation  into  initial 
state  covariance  settings,  displays  the  overall  probability  flow  among  the  elemen¬ 
tal  filters.  This  plot  is  intended  to  emphasize  the  MMAE’s  indication  of  the  filter 
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(a)  Temp  at  ‘left’  end 
of  rod  (p  =  0) 

(b)  Error  temp  at  ‘left’ 
end  of  rod  (p  =  0) 

(c)  Temp  at  ‘center’ 
of  rod  (p  =  0.5) 

(d)  Error  temp  at  ‘center’ 
of  rod  (p  =  0.5) 

(e)  Temp  at  ‘right’  end 
of  rod  (p  =  1) 

(f)  Error  temp  at  ‘right’ 
end  of  rod  (p  =  1) 

(g)  Likelihood  quotient: 
rTk(U)A-\t^rJ(U) 

(h)  Hypothesis  conditional 
probability:  Pk(U) 

Table  5.2  Arrangement  for  Plots  (a)  through  (h)  for  the  kth  Elemental  Filter 

based  on  the  best  hypothesis2,  and  thus  the  elemental  hlter(s)  most  responsible  for 
the  overall  MMAE  performance.  In  this  dissertation,  the  best  hypothesis  is  defined 
as  the  hypothesis  that  gives  rise  to  the  elemental  filter  with  the  largest  hypothesis 
conditional  probability;  hence,  the  best  elemental  filter  is  the  filter  which  receives 
the  highest  hypothesis  conditional  probability.  Following  the  introduction  to  the 
simulation  and  a  brief  discussion  of  expected  filter  performance,  there  are  two  pages 
of  plots  for  each  of  the  (three  or  five)  elemental  filters3  in  the  MMAE  filter  bank, 
followed  by  a  two-page  set  of  plots  for  the  blended  filter.  The  Erst  set  of  plots  for 
an  elemental  filter  contains  a  full  accounting  of  the  filter’s  progression  through  time 
at  three  strategic  points  along  the  rod  in  plots  (a)  through  (f);  a  plot  of  the  likeli¬ 
hood  quotient  appears  in  plot  (g),  and  plot  (h)  contains  the  hypothesis  conditional 
probability  —  see  Table  5.2  for  the  placement  of  these  plots  in  the  figure.  Plot  (h) 
contains  a  comparison  of  the  mean  plus  and  minus  one  standard  deviation  of  the 
hypothesis  conditional  probability  for  each  of  the  elemental  filters  in  the  bank  for 
each  simulation;  this  plot  enables  a  quick  analysis  of  the  probability  flow  among  the 
elemental  filters. 

2Equation  (4.268),  on  page  4-55,  is  used  to  calculate  these  probabilities. 

3For  the  first  simulation,  we  do  not  give  an  individual  accounting  of  the  elemental  filters  since  the 
focus  of  the  first  simulation  was  to  discover/identify  trends  regarding  the  filter  bank  composition. 
With  that  in  mind,  the  figures  for  simulation  one  are  composite  summaries  of  the  hypothesis 
conditional  probability  histories  for  all  five  elemental  filters  in  the  filter  bank. 
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For  plots  (a),  (c),  and  (e),  the  solid  line  represents  the  mean  elemental  fil¬ 
ter  estimate  (a  mean  taken  over  the  50  Monte  Carlo  runs)  while  the  dashed  line 
represents  the  truth  state.  In  plots  (b),  (d),  and  (f),  the  solid  line  represents  the 
mean  elemental  filter  error,  the  dash-dot  lines  are  at  plus  and  minus  one  truth  stan¬ 
dard  deviation  from  the  elemental  filter  mean  error  (solid  line),  and  dashed  lines 
are  zero  (the  filter-assumed  mean  error)  plus  and  minus  filter-computed  standard 
deviation.  Plot  (g)  has  a  solid  line  for  the  mean  value  of  the  likelihood  quotient: 
Lk(U )  =  r  J(U)  A A(1(C)  r k(U),  a  measure  of  the  “correctness”  of  the  parameter  values 
for  the  kth  elemental  filter  [130,  132];  the  dash-dot  lines  are  at  plus  and  minus  one 
truth  standard  deviation.  In  plot  (h),  the  solid  line  traces  out  the  mean  hypothesis 
conditional  probability,  Pkifi)-,  the  probability  that  the  assumed  parameter  value  is 
correct  conditioned  on  the  observed  measurement  history  through  time  ti  [130,  132]; 
the  dash-dot  lines  are  at  plus  and  minus  one  standard  deviation. 

In  practice,  it  is  useful  to  inspect  plot  (h)  for  each  of  the  elemental  filters  first; 
that  is  why  we  have  included  a  summary  of  the  (h)  plots  for  the  entire  filter  bank 
early  in  the  discussion  (namely,  in  the  “initial  results”  figure  discussed  on  page  5- 
4).  The  “initial  results”  summary  tells  us  which  filter  is  most  responsible  for  the 
overall  MMAE  performance  and  it  directs  our  attention  to  the  performance  of  the 
elemental  filter  with  the  best  hypothesis.  [For  simulation  one,  only  initial  results 
are  reported;  they  are  given  in  figures  containing  sixteen  such  “initial  results”  plots 
arranged  in  a  four  by  four  array;  see,  for  example  Figure  5.1  on  page  5-15.]  This 
initial  results  summary  does  not  replace  the  (h)  plots  —  we  still  need  plot  (h)  because 
it  contains  more  information  about  the  performance;  plot  (h)  tells  us  how  much  the 
mean  hypothesis  conditional  probability  varies  over  the  50  Monte  Carlo  runs  by 
stating  the  mean  plus  and  minus  one  standard  deviation.  After  considering  plot  (h), 
we  usually  look  at  the  likelihood  quotient  in  plot  (g).  It  is  highly  likely  that  the 
best  filter  model  (the  one  with  the  largest  mean  hypothesis  conditional  probability) 
will  also  display  a  sequence  of  likelihood  quotients  equal  to  the  number  of  sensor 
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segments,  while  the  filters  with  the  lowest  mean  hypothesis  conditional  probability 
will  often  have  a  sequence  of  likelihood  quotients  much  larger  than  the  number  of 
sensor  segments.  Next,  the  state  estimate  in  plots  (a),  (c),  and  (e)  are  assessed  more 
fully  by  the  estimation  error  in  plots  (b),  (d),  and  (f).  Among  other  attributes  of 
the  elemental  filter,  the  adequacy  of  the  initial  state  covariance  can  be  checked  by 
inspecting  plots  (b),  (d),  and  (f);  the  initial  error  should  be  within  the  la  (i.e.,  one 
standard  deviation)  bounds  created  by  the  initial  state  covariance.  If  this  is  not 
true,  then  convergence  is  greatly  hampered  since  the  filter  has  been  told  that  its 
initial  condition  errors  are  much  smaller  than  they  really  are.  Finally,  we  repeat  this 
inspection  process  for  the  other  “less  probable”  elemental  filters. 

Plot  (g),  of  for  example  Figure  5.14,  on  page  5-36,  features  the  likelihood 
quotient.  For  a  properly  matched  elemental  filter,  we  expect  a  likelihood  quotient  of 
around  M  =  5.  This  is  not  a  guarantee  that  the  filter  is  properly  matched  because 
the  filter-computed  residual  covariance  could  be  masking  poor  residuals;  however,  a 
likelihood  quotient  significantly  greater  than  M  =  5  is  a  strong  indication  that  the 
filter  is  mismodeled.  The  best  indication  of  a  filter  based  on  the  correct  model  for  the 
situation  is  given  by  the  hypothesis  conditional  probability  in  plot  (h).  When  Pk  =  1, 
the  MMAE  is  indicating  that  the  model  completely  matches  reality  with  probability 
one;  here,  the  completeness  is  relative  to  the  other  elemental  filters  which  are  based 
on  relatively  inaccurate  models  as  compared  to  the  elemental  filter  receiving  Pk  —  1. 
When  pk  is  small,  the  filter  is  either  mismatched  or  poorly  tuned. 

Improved  state  estimation  is  usually  the  end  goal  of  our  adaptive  estimation 
process,  i.e.,  we  seek  to  produce  a  better  state  estimate  using  an  MMAE  structure, 
rather  than  a  precise  estimate  of  the  uncertain  parameter  itself.  However,  in  Section 
5.7,  we  use  the  MMAE  to  estimate  the  parameter  of  interest  —  also  known  as  system 
identification.  The  MMAE  state  estimator  usually  outperforms  a  similar  estimator 
based  on  a  single  elemental  filter  (based  on  a  single  assumed  value  for  the  uncertain 
parameter).  The  second  set  of  plots,  (i)  through  (p),  for  each  elemental  filter  displays 
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the  temperature  along  the  entire  length  of  the  rod  at  selected  instances  of  time  from 
time  zero  to  the  end  of  the  simulation.  The  state  at  time  zero  reflects  the  initial  con¬ 
ditions,  while  the  state  at  the  end  of  the  simulation  includes  the  final  measurement 
update.  With  the  exception  of  the  first  time  instant,  all  of  the  times  show  results 
for  just  after  the  measurement  update,  i.e.,  at  time  tf.  Additionally,  the  root  mean 
square  (RMS)  error  of  the  temperature  estimate  is  displayed  on  each  subplot  to  help 
quantify  the  performance  over  the  entire  rod4.  The  solid  line  represents  the  filter’s 
mean  temperature  estimate  while  the  dashed  line  is  the  true  temperature. 

The  first  set  of  plots  for  the  blended  filter5  is  similar  to  plots  (a)  through  (f) 
for  an  elemental  filter,  including  the  dashed  lines  in  (b),  (d),  and  (f)  that  reflect  the 
blended  filter-indicated  zero  plus  and  minus  one  sigma  values.  Plot  (g)  shows  the 
RMS  error,  as  a  function  of  time,  for  the  MMAE  blended  state  estimate.  The  second 
set  of  plots,  (h)  through  (o),  for  the  blended  filter,  are  nearly  the  same  as  the  second 
set  for  the  elemental  filters.  While  the  elemental  filter  time  line  begins  with  the 
initial  conditions,  the  blended  filter  time  line  begins  after  the  MMAE  has  produced 
its  first  estimate  of  the  temperature,  i.e.,  after  the  first  measurement  update  —  the 
blended  filter  represents  the  state  estimate  computed  using  Equation  (4.272)  on  page 
4-56  and  its  conditional  covariance  in  Equation  (4.273).  Thus,  a  careful  inspection 
of  plots  (i)  for  each  of  the  elemental  filters  and  plot  (h)  for  the  blended  filter  will 
show  that  they  do  in  fact,  not  reflect  the  exact  same  instant  of  time  at  ti  =  0  sec. 
However,  the  remaining  temperature  snapshots  are  for  the  same  time  instances. 

It  should  not  be  surprising  to  the  reader  to  find  on  the  following  pages,  better 
adaptation  to  uncertainty  in  the  measurement-corruption  noise  covariance  versus 
system-dynamics  noise  strength.  Consider  the  fact  that,  first  of  all,  models  for 

4The  RMS  error  on  these  plots  is  an  “instantaneous”  value  and  should  not  be  treated  as  an 
absolute  indicator  that  can  be  compared  exactly  from  plot  to  plot,  and  especially  not  between 
different  experiments,  nor  between  cases  within  a  particular  simulation,  since  the  results  are  based 
on  different  sets  of  Monte  Carlo  runs.  The  trends,  however,  are  certainly  valid. 

5The  blended  filter  is  not  really  a  filter,  but  a  blending  of  the  data/results  from  the  elemental 
filters. 
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dynamic  system  processes  are  often  less  precise  or  less  well  understood  as  compared 
to  our  measurement  models  for  a  particular  problem  or  application.  Additionally, 
while  we  can  calibrate  and  inspect  our  measurement  apparatus,  we  often  cannot 
do  likewise  for  the  system  itself.  Furthermore,  knowledge  of  the  dynamics  is  only 
available  to  us  through  the  measurement  process;  a  process  which  adds  yet  another 
layer  of  uncertainty  to  our  state  estimation.  More  specifically,  the  uncertainties 
in  the  measurement-corruption  noise  v,  as  in  its  covariance  R,  directly  impact  the 
measurement  residuals  since  z  =  Hx  +  v.  On  the  other  hand,  the  effects  of  dynamics 
driving  noise  parameters,  such  as  Q,  first  impact  the  system  dynamics  and  (after 
the  inherent  delays  and  other  effects  of  the  system  itself)  are  then  reflected  in  the 
state  values  as  expressed  by  Hx  in  the  measurement  equation  z  =  Hx  +  v.  Thus, 
the  effects  of  uncertain  dynamics  model  parameters  are  not  as  directly  viewable  as 
those  in  the  measurement  model. 

Before  delving  into  the  results  of  the  simulations,  we  shall  investigate  the  ex¬ 
pected  behavior  for  the  likelihood  quotient:  Lk(ti )  =  r k(ti)  A kl{ti)  r fc(f;),  where  the 
filter-computed  residual  covariance,  A k(ti)  =  Hfc(fj)  P k{t~)  Hj(ti)  +  Hk(ti),  depends 
on  the  elemental  filter  design  model.  In  the  third,  fourth,  and  fifth  simulations,  we 
have  five  elemental  filters,  all  with  different  measurement-corruption  noise  covari¬ 
ances.  Thus,  we  have  five  different  A^  matrices.  We  can  use  this  knowledge  to  help 
us  predict  what  the  likelihood  quotient  will  be  for  all  of  the  elemental  filters  before 
we  run  any  simulation.  Recall  that  when  an  elemental  filter  matches  the  true  sce¬ 
nario,  the  history  of  residuals  form  a  zero-mean  white  Gaussian  noise  sequence  with 
known  covariance  —  the  filter-computed  residual  covariance  A  —  and  the  expected 
likelihood  quotient  is  equal  to  the  number  of  measurements,  M,  and  in  practice,  this 
is  generally  the  case.  When  the  filter  is  mismatched,  however,  Lk  will  usually  differ 
considerably  (and  often  becomes  several  orders  of  magnitude  larger  than  the  number 
of  measurements). 
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From  Maybeck  [129]  we  know  that  the  zero-mean  residual  sequence  for  a 
Kalman  filter  based  upon  the  true  parameter  value  has  covariance 

E{r(ti )  rT(ti)|Z(fi_i)  =  Z*_i}  =  A true(ti)  (5.1) 

which  we  note  is  independent  of  the  measurement  history  Zj_i.  Now  we  find  the 
expected  value  of  the  random  likelihood  quotient,  L k(U),  at  time  tt  for  the  kth 
elemental  hlter 

E{Lk(ti)}  =  E{rJ(ti)  rk(ti)}  =  tr{£[rJ(U)  A^(U)  rfc(^)]}  (5.2) 

where  r k(t.j)  is  the  random  measurement  residual;  additionally,  we  note  that  the 
trace  of  a  scalar  function  is  equal  to  that  scalar  function.  Moving  the  trace  inside 
the  expectation  yields 


E{lkm  =  E {tv[rl (ti)  A^iu)  rk(ti)}}  (5.3) 

and  this  formulation  allows  the  terms  to  commute,  thus  we  obtain 

E{lkm  =  EitrlA-'iU)  rk{ti)  r £(**)]}  (5.4) 

Next  we  move  the  expectation  back  in  and  get 

E{Um  =  trjA"1^)  E[rk{ti)  rJ(U)}}  (5.5) 


Using  Equation  (5.1) 


E{Lk(ti)}  =  tr{Afe  1(ti)  Atrue(U)}  (5.6) 

If  for  all  time  U,  A k(ti)  =  Atrue(U),  then  the  right-hand  side  of  Equation  (5.6) 
becomes  the  trace  of  the  identity  matrix  which  is  equal  to  the  dimension  of  the 
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matrix,  in  this  case  M .  Thus,  in  the  simulations  considered  here,  we  expect  the 
likelihood  quotient  to  be  five,  the  number  of  sensor  segments,  when  the  filter  hy¬ 
pothesis  is  correct.  However,  most  of  the  filters  are  created  using  an  incorrect 
hypothesis  and  thus  we  usually  have  A fc(ij)  ^  Atrue(ij),  where  we  recall  that 
A (ti)  =  H(tj)  P(t“)  Hr(tj)  +  R (tj).  In  steady  state  operation,  which  usually  oc¬ 
curs  after  just  a  few  propagate/update  cycles,  we  have  A|s  «  R*,  and  A®®ue  «  Rtrue 
for  the  particular  example  simulated  herein.  Thus, 


E{Lk(ti)\u=taa}  =  tr{Rfc  1Rtrue} 

(5.7) 

=  tr { [Rfcl] ~~ 1  -Rtmel } ,  where  R/,  =  Rk  I  and  RtrUe 

=  i?tru«I  (5.8) 

-Rtrue,  rTi 

=  „  tr{I} 

rCk 

(5.9) 

=  RtIue  M 

Rk 

(5.10) 

where  we  assume  that  (1)  all  M  of  the  eigenvalues  of  R^  are  equal  to  Rk,  (2)  all  M  of 
the  eigenvalues  of  RtrUe  are  equal  to  -Rtrue,  and  (3)  M  is  the  dimension  of  the  square 
covariance  matrices  R/  and  Rtrue  and  the  identity  matrix  I.  The  first  equality  in 
the  above  development  is  due  to  the  steady  state  assumption;  the  fourth  equal  sign 
is  by  the  definition  of  the  trace  of  a  matrix  —  it  is  the  sum  of  the  diagonal  elements. 
Therefore,  we  can  use  Equation  (5.10)  to  predict  the  steady  state  likelihood  quotient 
for  an  elemental  filter  created  using  a  model  that  differs  from  the  true  value  of  R ; 
this  will  be  very  valuable  for  our  analysis  in  Simulations  3,  4,  and  5. 

In  the  first  five  simulations,  we  initially  investigate  state  estimation  in  the 
presence  of  an  unknown  noise  environment  and  then  (in  the  sixth  simulation)  we 
perform  system  identification.  We  find  that,  even  when  we  “poorly”  identify  the 
system-dynamics  noise  strength,  the  state  estimation  task  fairs  well.  We  treat  filter 
bank  composition  and  filter  initialization  issues  for  improving  adaptation  to  unknown 
noise  statistics. 
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5.2  Simulation  1 


When  a  Kalman  filter  is  designed  using  mismatched  noise  statistics,  the  filter  is 
no  longer  optimal  [78,  143,  129,  169].  Thus  we  are  motivated  in  this  first  simulation 
to  demonstrate  how  a  bank  of  filters  can  be  used  to  estimate  the  value  of  an  unknown 
parameter  —  the  system  dynamics  driving  noise  strength  Q.  While  we  are  specifically 
calling  attention  to  this  one  parameter,  we  are  really  interested  in  improving  our  state 
estimation  and  identifying  trends  useful  for  filter  bank  composition.  Initial  work  on 
this  simulation  showed  that  it  was  difficult  to  create  a  bank  of  elemental  filters  that 
both  spanned  the  entire  range  of  expected  values  and  appeared  distinct  from  one 
another.  Thus,  before  we  can  concentrate  on  state  estimation,  we  must  first  create 
a  good  bank  of  filters. 

As  a  precursor  to  using  the  MMAE  to  estimate  the  dynamics  noise  strength, 
Q ,  we  conduct  a  study  to  find  the  best  discrete  set  of  values  to  represent  the  con¬ 
tinuum  of  possible  values  for  Q.  The  objective  of  this  experiment  is  not  to  estimate 
the  parameter,  but  to  illustrate  the  parameter  space  discretization  as  indicated  by 
the  hypothesis  conditional  probability  time  histories.  For  optimal  state  and/or  pa¬ 
rameter  estimation  performance,  the  elemental  filters  in  the  filter  bank  must  be 
distinguishable  from  one  another.  In  addition  to  being  distinct,  the  elemental  filters 
must  be  based  on  a  set  of  parameter  values  that  covers  the  entire  range  of  expected 
values  for  the  parameter  of  interest. 

Tables  5.3  and  5.4  contain  key  truth  and  filter  parameter  values6  for  the  16- 
case  runs  displayed  in  Figures  5.1  to  5.11.  The  elemental  filter  quintet  for  each 
case  displayed  in  Figures  5.1  to  5.11  is  determined  using  the  data  in  Table  5.4.  As 
an  example,  the  Q  values  used  to  construct  the  elemental  filter  quintets  for  the 
Qtme  =  10,  Atrue  =  0.1  case  are  reported  in  Table  5.5;  this  corresponds  to  results 

6In  Table  5.4,  as  in  the  rest  of  the  tables  in  this  chapter,  important  values  that  are  constant 
across  the  elemental  filters  in  the  bank  are  only  listed  once.  For  example,  all  three  elemental  filters 
in  Table  5.4  are  designed  using  the  same  R ,  Xq,  and  Pq  values,  whereas  the  Q  is  different  for  each 
elemental  filter. 
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Figure 

N 

Qtrue 

^true 

-^filter 

5.1 

30 

10 

(°C)2/sec 

0.1  (°C)2 

0.1  (°C)2 

5.2 

30 

100 

(°C)2/sec 

1  (°C)2 

1  (°C)2 

5.3 

30 

50 

(°C)2/sec 

1  (°C)2 

1  (°C)2 

5.4 

30 

20 

(°C)2/sec 

1  (°C)2 

1  (°C)2 

5.5 

30 

10 

(°C)2/sec 

1  (°C)2 

1  (°C)2 

5.6 

40 

10 

(°C)2/sec 

1  (°C)2 

1  (°C)2 

5.7 

50 

10 

(°C)2/sec 

1  (°C)2 

1  (°C)2 

5.8 

60 

10 

(°C)2/sec 

1  (°C)2 

1  (°C)2 

5.9 

70 

10 

(°C)2/sec 

1  (°C)2 

1  (°C)2 

5.10 

30 

100 

(°C)2/sec 

1  (°C)2 

10  (°C)2 

5.11 

30 

100 

(°C)2/sec 

10  (°C)2 

1  (°C)2 

Table  5.3  Simulation  1:  Four  key  parameters  for  the  filter  bank  composition  ex¬ 
periment.  N  represents  the  order  of  the  system  model. 

shown  in  Figure  5.1.  Thus,  the  far  left  column  of  plots  contain  filter  banks  centered 
on  the  true  value  of  Q,  while  the  second  column  (from  the  left)  contains  filter  banks 
centered  on  twice  the  true  value  of  Q.  The  top  row  contains  filter  banks  with  Q 
values  spaced  a  decade  apart,  while  the  bottom  row  of  filter  banks  are  spaced  by  two 
orders  of  magnitude.  Each  of  the  subplots  for  each  case  represents  the  average  results 
for  fifty  Monte  Carlo  runs;  thus,  each  of  the  figures  from  Figure  5.1  to  5.11  contains 
the  probability  flow  results  for  800  Monte  Carlo  runs.  Finally,  the  discretization 
method  chosen  for  this  experiment  is  reminiscent  of  the  simple  logarithmic  spacing 
of  the  elemental  filters  proposed  in  Section  2.3.3. 

Before  we  begin  our  analysis,  we  shall  take  a  tour  of  Figure  5.1,  the  first  of 
eleven  such  figures.  In  each  column  of  plots,  going  down  a  column  corresponds 
to  increasing  the  coarseness  of  the  discretization;  thus  we  expect  that  the  proper 
distinguishability  of  the  elemental  filters  will  increase,  as  demonstrated  by  an  increase 
in  the  share  of  the  probability  received  by  the  elemental  filter  designed  with  the 
most  appropriate  value  for  the  dynamics  noise  strength.  The  trend  observed  by 
looking  across  a  row  of  plots  is  not  always  as  straightforward  since  it  is  somewhat 
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Filter 

Q  filter 

-^filter 

x0 

Po 

1 

(c/d2)Qtrue 

2 

{cj  (£)Qtr\ie 

3 

cQ  true 

-Rtrue 

20 

10 

4 

cdQ  true 

5 

cd  Qtrue 

Table  5.4  Simulation  1:  Elemental  filter  parameters  for  the  filter  bank  composition 
experiment,  c  represents  the  centering  of  parameter  values  used  as  a  “basis”  for  the 
elemental  filters  in  the  bank,  d  pertains  to  the  discretization  of  the  parameter  set. 
Note  that  for  the  cases  in  Figures  5.10  and  5.11  were  over-  and  underestimated 
by  a  factor  of  ten,  respectively. 


c  =  1  c  =  2  c  =  5  c  =  10 

d  =  10 

d  =  20 

d  =  50 

d  =  100 

x,  1,  10,  100.  1000  i.2,20,200,2000  1,5,50,500,5000  1,10,100,1000,10000 

X,  1,10,  200,4000  i,  1,20, 400, 8000  1,21,50,1000,20000  1,5,100,2000,40000 

51^,1,10,500,25000  513,1,20,1000,50000  X,  1,50,  2500,  125000  W  ,  2,  100,  5000,  250000 

jlj,  10,  1000,  100000  315,1,20,2000,200000  5lg,l,50,5000,500000  Ij,  1,  100,  10000,  1000000 

Table  5.5  Dynamics  noise  strengths  for  each  of  the  elemental  filters  shown  in  the 
16  plots  of  Figure  5.1,  where  Qtme  —  10  and  i?true  =  0.1.  Note  that  only  the  first 
column  (and  the  top  right  case)  have  the  true  value  for  Q  included  in  the  bank  of 
elemental  filters.  Additionally,  the  last  four  elemental  filters  shown  in  the  top  left 
plot  are  constructed  using  the  same  model  as  the  first  four  elemental  filters  of  the 
top  right  case. 

dependent  on  the  discretization  level.  In  the  first  column  of  plots,  the  third  (or 
central)  elemental  filter  in  the  bank  of  five  filters  is  designed  with  artificial  knowledge 
of  the  true  dynamics  noise  strength;  in  the  second,  third,  and  fourth  columns,  the 
central  filter  is  designed  for  2<5tme,  5<5tme,  and  10Qtme,  respectively.  Thus,  for  the 
case  of  (c,  d)  =  (10,10),  the  second  elemental  filter  assumes  (correctly)  that  the 
noise  strength  is  Qtme,  while  the  central  filter  is  no  longer  the  best  filter  since  it  was 
designed  for  10<5tme.  However,  in  general,  the  central  filter  becomes  less  probable  as 
the  multiplicative  offset  (dictated  by  c)  increases,  while  the  second  elemental  filter 
becomes  more  probable  since  its  underestimate  of  the  noise  strength  becomes  small 
relative  to  the  overestimate  assumed  by  the  central  filter. 
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1 


c  =  1 


c  =  2 


c  =  5 


c=  10 


0  0.5  1  0  0.5  1  0  0.5  1  0  0.5  1 


time  (s)  time  (s)  time  (s)  time  (s) 


Figure  5.1  Simulation  1:  Filter  bank  composition  experiment.  Hypothesis  con¬ 
ditional  probability  histories  for  16  cases  of  interest  for  N  =  30,  Qtrue  =  10, 
Rtme  =  0.1.  Filter  1  (0):  Qi  =  10 c/d2,  Filter  2  (x):  Q2  =  10 c/d,  Filter  3  (A): 
Q3  =  10c,  Filter  4  (□):  Q4  =  lOcd,  Filter  5  (★):  Q$  =  10cd2.  To  make  these 
small  plots  more  legible,  the  mean  hypothesis  conditional  probabilities  are  presented 
for  only  times  {to,t3,t6, . . .  tg 9}  versus  all  of  T  =  {£0,  H,  t2, . . . ,  tioo}- 


Furthermore,  it  may  be  helpful  to  consider  the  following  simple  mnemonic  for 
associating  the  markers  on  the  hypothesis  conditional  probability  history  plots  to  its 
respective  elemental  filter:  The  first  filter  symbol  is  a  single  curved  line  in  the  shape 
of  a  circle  O-  Next,  we  employ  two  crossed  line  segments,  x,  to  represent  the  second 
elemental  filter.  The  three  angles  in  a  triangle  A  marks  the  graph  for  elemental  filter 
three.  Elemental  filter  four  uses  the  four- sided  square  □.  Finally,  elemental  filter 
five  employs  the  five- pointed  star  ★  to  annotate  its  probability  history. 
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d 


d 


c —  1  c  =  2  c  =  5 


c=  10 


Figure  5.2  Simulation  1:  Filter  bank  composition  experiment.  Hypothesis  con¬ 
ditional  probability  histories  for  16  cases  of  interest  for  N  =  30,  Qtrue  =  100, 
Rtrue  —  1-  Filter  1  (0):  Qi  —  100 c/d2,  Filter  2  (x):  Q2  =  100 c/d,  Filter  3  (A): 
Q3  =  100c,  Filter  4  (□):  Q4  =  lOOcd,  Filter  5  (★):  Q$  =  lOOcd2.  To  make  these 
small  plots  more  legible,  the  mean  hypothesis  conditional  probabilities  are  presented 
for  only  times  {t0,  t3,  t6,  -  -  -  ^99}  versus  all  of  T  =  {to,  t±,  t2, . . . ,  Hoo}- 


There  are  several  trends  regarding  filter  bank  composition  that  can  be  readily 
seen  by  inspecting  the  plots  in  Figures  5.1  through  5.11.  Using  the  plots  in  these 
eleven  figures,  we  shall  graphically  exhibit  three  trends  which  feature  increased  prob¬ 
ability  for  the  elemental  filter  based  on  the  most  correct  model:  increasing  d  (i.e. , 
increasing  the  coarseness  of  the  parameter  discretization),  increasing  the  Q/R  ra¬ 
tio,  where  Q  is  the  dynamics  noise  strength  and  R  is  the  measurement-corruption 
noise  covariance,  and/or  increasing  the  order  of  the  model,  N .  Our  first  choice  may 
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d  =  20 


d  =  50 


d  =  100 


0.5 

time  (s) 


0.5 

time  (s) 


0.5  1 

time  (s) 


Figure  5.3  Simulation  1:  Filter  bank  composition  experiment.  Hypothesis  con¬ 
ditional  probability  histories  for  16  cases  of  interest  for  N  =  30,  Qtrue  =  50, 
i?true  =  1-  Filter  1  (0):  Qi  —  50 c/d2,  Filter  2  (x):  Q2  =  50 c/d,  Filter  3  (A): 
Q3  =  50c,  Filter  4  (□):  Q4  =  50 cd,  Filter  5  (★):  Q$  =  50 cd2.  To  make  these 
small  plots  more  legible,  the  mean  hypothesis  conditional  probabilities  are  presented 
for  only  times  {to,t3,t6, . . .  tg 9}  versus  all  of  T  =  {£0,  H,  t-2, . . . ,  Hoo}- 


be  to  increase  the  coarseness  of  the  discretization,  but,  this  is  a  tradeoff  that  ex¬ 
changes  quality  of  the  state  estimate  for  improved  parameter  estimation;  a  moving 
bank  structure,  see  the  discussion  in  Section  2.6,  may  help  ease  these  trade-off  costs. 
When  increasing  N  is  not  affordable  computationally,  perhaps  we  can  tune  the  fil¬ 
ters  and  achieve  a  more  favorable  probability  flow  to  what  we  believe  may  be  the 
most  appropriate  elemental  filter.  As  we  might  expect,  these  results  are  dependent 
on  correct  assumptions  for  the  other  model  parameters.  In  the  filter-bank  results 
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Figure  5.4  Simulation  1:  Filter  bank  composition  experiment.  Hypothesis  con¬ 
ditional  probability  histories  for  16  cases  of  interest  for  N  =  30,  Qtrue  =  20, 
iiltrue  =  1-  Filter  1  (0):  Qi  —  20 c/d2,  Filter  2  (x):  Q2  =  20c/d,  Filter  3  (A): 
Q3  =  20 c,  Filter  4  (□):  Q4  =  20 cd,  Filter  5  (★):  Q$  =  20 cd2.  To  make  these 
small  plots  more  legible,  the  mean  hypothesis  conditional  probabilities  are  presented 
for  only  times  {to,t3,t6, . . .  tg 9}  versus  all  of  T  =  {£0,  H,  t-2, . . . ,  tioo}- 


shown  in  Figures  5.1  through  5.9,  we  have  taken  for  granted  that  we  have  accurately 
accounted  for  all  of  the  non-Q  model  parameter  values,  while  in  Figures  5.10  and 
5.11,  we  have  intentionally  based  the  bank  of  filters  on  an  incorrect  value  for  the 
true  R.  Finally,  by  choosing  a  higher-order  model,  we  can  regain  some  of  the  fidelity 
that  we  traded  off  earlier. 

In  general,  as  the  discretization  of  the  parameter  space  becomes  coarser  (i.e. ,  as 
d  increases),  the  elemental  filter  based  upon  the  most  correct  model  receives  a  larger 
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Figure  5.5  Simulation  1:  Filter  bank  composition  experiment.  Hypothesis  con¬ 
ditional  probability  histories  for  16  cases  of  interest  for  N  =  30,  Qtrue  =  10, 
i?true  =  1-  Filter  1  (0):  Qi  —  10 c/d2,  Filter  2  (x):  Q2  =  10 c/d,  Filter  3  (A): 
Q3  =  10c,  Filter  4  (□):  Q4  =  lOcd,  Filter  5  (★):  Q$  =  10cd2.  To  make  these 
small  plots  more  legible,  the  mean  hypothesis  conditional  probabilities  are  presented 
for  only  times  {to,t3,t6, . . .  tg 9}  versus  all  of  T  =  {£0,  fa,  ■  ■  ■ ,  Hoo}- 


share  of  the  probability,  this  trend  can  be  seen  repeatedly  in  nearly  all  of  the  plots  in 
Figures  5.1  through  5.11.  A  particularly  good  example  of  this  trend  can  be  seen  in 
the  far  left  (or  first)  column  of  plots  in  Figure  5.4  on  page  5-18;  the  third  filter  (rep¬ 
resented  by  the  triangles)  is  based  on  the  true  values  of  the  parameters  and,  as  we 
can  see,  it  gets  a  larger  probability  as  we  increase  the  coarseness  of  the  discretization 
down  the  column  from  d  =  10  to  20  to  50,  and  finally  d  =  100.  In  the  (c,  d)  =  (1,  20) 
plot,  the  third  and  fourth  (represented  by  the  squares)  elemental  filters  are  virtually 
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Figure  5.6  Simulation  1:  Filter  bank  composition  experiment.  Hypothesis  con¬ 
ditional  probability  histories  for  16  cases  of  interest  for  N  =  40,  Qtrue  =  10, 
i?true  =  1-  Filter  1  (0):  Qi  —  10 c/d2,  Filter  2  (x):  Q2  =  10 c/d,  Filter  3  (A): 
Q3  =  10c,  Filter  4  (□):  Q4  =  lOcd,  Filter  5  (★):  Q$  =  10cd2.  To  make  these 
small  plots  more  legible,  the  mean  hypothesis  conditional  probabilities  are  presented 
for  only  times  {to,t3,t6, . . .  tg 9}  versus  all  of  T  =  {t0,  t-2,  ■  ■  ■ ,  Hoo}- 


indistinguishable.  Increasing  the  discretization  to  the  d  =  50  level,  the  third  filter 
gains  the  clear  majority  of  the  probability.  So,  while  the  distinguishability  of  the  el¬ 
emental  filters  appears  to  degrade  before  it  eventually  improves,  the  probability  flow 
to  the  elemental  filter  based  on  the  most  correct  model  clearly  increases  with  an  in¬ 
creasingly  coarse  level  of  discretization.  Hence,  as  the  discretization  of  the  parameter 
space  becomes  coarser,  the  elemental  filters  become  more  properly  distinguishable 
as  the  most  appropriately  modeled  filter  gains  a  larger  share  of  the  probability. 
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Figure  5.7  Simulation  1:  Filter  bank  composition  experiment.  Hypothesis  con¬ 
ditional  probability  histories  for  16  cases  of  interest  for  N  =  50,  Qtrue  =  10, 
i?true  =  1-  Filter  1  (0):  Qi  —  10 c/d2,  Filter  2  (x):  Q2  =  10 c/d,  Filter  3  (A): 
Q3  =  10c,  Filter  4  (□):  Q4  =  10 cd,  Filter  5  (★):  Q$  =  10 cd2.  To  make  these 
small  plots  more  legible,  the  mean  hypothesis  conditional  probabilities  are  presented 
for  only  times  {to,t3,t6, . . .  tg 9}  versus  all  of  T  =  {£0,  H,  t-2, . . . ,  tioo}- 


The  relative  ratio  of  Q/R  can  be  seen  to  influence  the  behavior  of  the  bank 
of  filters  directly,  as  seen  in  these  probability  plots.  When  the  ratio  Q/R  is  very 
high,  see  Figures  5.1  and  5.2,  the  elemental  filter  based  on  the  most  correct  model 
routinely  gets  the  bulk  of  the  probability.  As  we  can  see,  the  arrays  of  plots  in 
these  two  figures  are  nearly  identical;  this  is  because  they  have  the  same  Q/R  ratio. 
On  the  other  hand,  halving  the  ratio  Q/R  from  100  to  50  for  the  cases  shown  in 
Figures  5.2  and  5.3  shows  quite  a  different  result  for  the  “finely”  discretized  d  =  10 
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Figure  5.8  Simulation  1:  Filter  bank  composition  experiment.  Hypothesis  con¬ 
ditional  probability  histories  for  16  cases  of  interest  for  N  =  60,  Qtrue  =  10, 

i?true  =  1-  Filter  1  (0):  Qi  —  10 c/d2,  Filter  2  (x):  Q2  =  10 c/d,  Filter  3  (A): 
Q3  =  10c,  Filter  4  (□):  Q4  =  lOcd,  Filter  5  (★):  Q$  =  10cd2.  To  make  these 
small  plots  more  legible,  the  mean  hypothesis  conditional  probabilities  are  presented 
for  only  times  {to,t3,t6, . . .  tg 9}  versus  all  of  T  =  {t0,  t-2,  ■  ■  ■ ,  Hoo}- 


top  row  of  plots.  Specifically,  the  third  (second)  and  fourth  (third)  elemental  filters 
are  approaching  indistinguishability  in  the  centered  c  =  1  (c  =  10  far  right  column) 
case.  Thus,  in  terms  of  probability  flow,  the  relative  ratio  of  Q/R  matters  more  than 
the  particular  values  for  Q  and  R.  The  ratio  is  important  since  Q/R  dictates  the 
steady  state  Kalman  gain  for  each  elemental  filter  —  the  steady  state  Kalman  gain 
is  directly  proportional  to  the  noise  strength  Q  and  inversely  related  to  the  noise 
covariance  R. 
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Figure  5.9  Simulation  1:  Filter  bank  composition  experiment.  Hypothesis  con¬ 
ditional  probability  histories  for  16  cases  of  interest  for  N  =  70,  Qtrue  =  10, 
i?true  =  1-  Filter  1  (0):  Qi  —  10 c/d2,  Filter  2  (x):  Q2  =  10 c/d,  Filter  3  (A): 
Q3  =  10c,  Filter  4  (□):  Q4  =  lOcd,  Filter  5  (★):  Q$  =  10cd2.  To  make  these 
small  plots  more  legible,  the  mean  hypothesis  conditional  probabilities  are  presented 
for  only  times  {to,t3,t6, . . .  tg 9}  versus  all  of  T  =  {t0,  t-2,  ■  ■  ■ ,  Hoo}- 


Assuming  we  have  chosen  R  wisely,  then  it  follows  that  when  Q/R  is  low,  we 
have  chosen  a  value  too  low  for  our  assumed  Q.  Hence,  “overestimating  Q11  at  this 
point  may  well  lead  us  to  a  value  for  Q  that  is  about  right  and  thus  this  elemental 
filter  receives  the  majority  of  the  probability  for  a  filter  bank  constructed  with  a 
relatively  fine  discretization  level;  see  for  example,  the  fourth  elemental  filter  (rep¬ 
resented  by  the  squares)  in  the  top  left  plot  in  Figure  5.4.  Scrolling  down  the  first 
column  of  plots  shows  that,  as  the  discretization  becomes  coarser,  the  “correct”  filter 
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Figure  5.10  Simulation  1:  Filter  bank  composition  experiment.  Hypothesis  con¬ 
ditional  probability  histories  for  16  cases  of  interest  for  N  =  30,  Qtrue  =  100 , 
i?true  =  1)  -Rfuter  =  10-  Filter  1  (0):  Qi  —  100 c/d2,  Filter  2  (x):  Q2  =  100 c/d, 
Filter  3  (A):  Q3  =  100c,  Filter  4  (□):  Q4  =  100cd,  Filter  5  (★):  Q3  =  lOOcd2.  To 
make  these  small  plots  more  legible,  the  mean  hypothesis  conditional  probabilities 
are  presented  for  only  times  {to,  t3,  to, . . .  t99}  versus  all  of  T  =  {t0,  t\,  t2, . . . ,  tioo}- 


(the  elemental  filter  based  on  the  most  correct  hypothesis  which  is  here  indicated 
by  the  triangles  since  it  is  the  third  filter  in  the  filter  bank)  receives  an  increas¬ 
ingly  larger  share  of  the  probability,  while  the  filter  featuring  an  overestimate  for 
Q  (squares)  receives  a  correspondingly  smaller  share  of  the  probability.  Looking  at 
the  second  plot  (c,  d)  =  (1,20),  we  see  that  the  third  and  fourth  elemental  Liters 
are  virtually  indistinguishable  as  has  been  previously  noted.  As  the  discretization 
coarsens  further  to  d  =  50  in  the  third  plot  of  the  first  column,  the  filters  are  once 
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Figure  5.11  Simulation  1:  Filter  bank  composition  experiment.  Hypothesis  con¬ 
ditional  probability  histories  for  16  cases  of  interest  for  N  =  30,  Qtrue  =  100 , 
i?true  =  10,  -Rmter  =  1-  Filter  1  (0):  Qi  —  100 c/d2,  Filter  2  (x):  Q2  =  100 c/d, 
Filter  3  (A):  Qs  =  100c,  Filter  4  (□):  Q4  =  100cd,  Filter  5  (★):  Q$  =  lOOcd2.  To 
make  these  small  plots  more  legible,  the  mean  hypothesis  conditional  probabilities 
are  presented  for  only  times  {to,  t3,  t6,  •  •  •  tgg}  versus  all  of  T  =  ,  tioo}- 


again  distinguishable;  the  third  filter,  constructed  using  the  true  value  for  Q,  gathers 
the  majority  of  the  probability.  This  sequence  of  events  occurs  several  times,  as  seen 
in  Figures  5.4  and  5.5. 

In  the  second  column  of  plots,  the  “best”  elemental  filter  is  still  the  third  filter 
(triangles);  it  was  designed  using  a  dynamics  noise  strength  of  twice  <5tme-  For  the 
cases  displayed  in  Figures  5.1  to  5.3,  the  ratio  Q/R  is  relatively  large  and  the  best 
filter  is  chosen  every  time.  As  Q/R  is  reduced  further,  as  in  Figures  5.4  and  5.5,  we 
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see  that  the  d  —  10  discretization  level  is  too  fine  and  results  in  probability  being 
shared  with  an  elemental  filter  featuring  an  even  larger  assumed  value  for  Q  in  both 
figures,  while  in  Figure  5.5,  the  fourth  filter  appears  to  be  the  best  match.  For  both 
cases,  as  the  discretization  is  made  coarser  by  increasing  d,  the  hypothesis  put  forth 
by  the  third  filter  gathers  the  highest  probability  in  the  filter  bank. 

We  have  seen  in  Figures  5.1  to  5.5  how  the  Q/R  ratio  and  the  discretization  of 
the  dynamics  noise  strength,  Q ,  have  generally  affected  the  probability  flow  to  the 
elemental  filters.  Now,  let’s  see  how  the  distinguishability  and  probability  flow  is  af¬ 
fected  by  the  order  of  the  model,  N.  In  Figures  5.5  to  5.9,  we  have  gradually  increased 
the  order  of  the  model  to  the  point  where  the  third  (most- correctly  modeled)  filter  is 
once  again  clearly  dominant  —  nearly  as  good  as  it  was  for  the  case  when  Q/R  =  100 
as  seen  in  Figures  5.1  and  5.2.  If  you  recall  our  discussion  early  in  this  chapter  on 
the  state  transition  matrix,  it  is  quite  remarkable  to  see  the  accumulation  of  effects 
contributed  by  the  31st  to  70th  “states”.  Recall  that  $  (U+i  ~U)  =  2.6  x  10-39, 

L  J  30 

while  another  computation  gives  $  (U+i  —  U )  =  9.3  x  10~211.  Thus,  the  states 

L  J  70 

corresponding  to  large  N  values,  i.e.,  N  >  30,  which  are  nearly  driven  to  zero  during 
each  propagation  cycle,  are  still  quite  important! 

In  the  previous  comments,  we  have  implicitly  assumed  that  our  models  were 
based  on  the  true  measurement  noise  covariance.  While  overestimation  of  the  true 
measurement  noise  covariance,  as  seen  in  Figure  5.10,  slows  the  probability  flow  to 
the  elemental  filter  based  on  the  correct  model,  underestimating  the  covariance,  as 
seen  in  Figure  5.11  “causes”  the  filters  to  attribute  the  increased  (and  unexpected) 
errors  to  the  dynamics  noise  strength.  The  real  problem  is  that  the  state  estimation 
suffers  in  both  cases,  devastatingly  for  the  underestimation  case. 

Now  that  we  have  graphically  seen  how  the  discretization  level,  d,  the  relative 
quality  of  the  dynamics  and  measurement  models,  Q/R  ratio,  and  the  order  of  the 
model,  N,  affect  the  distinguishability  of  the  elemental  filters,  we  shall  investigate 
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the  effects  of  filter  initialization  of  the  state  estimate,  x,  and  the  state  covariance, 
P,  in  the  next  simulation. 

5. 3  Simulation  2 

The  first  five  simulations  feature  adaptation  to  an  uncertain  noise  environment 
and  have  an  initial  state  estimate  Xq  of  25  °C.  However,  Table  5.1  on  page  5-3  gives 
the  true  value  as  20  °C;  hence  there  is  a  5  °C  bias  in  our  initial  state  estimate.  In 
a  real-world  scenario,  we  might  not  know  that  our  initial  state  estimate  was  off  by 
5  °C;  thus,  it  is  entirely  possible  that  we  would  set  the  initial  state  covariance,  Po, 
too  low.  In  Table  5.4,  we  have  set  the  initial  state  covariance  estimate,  Po,  equal  to 
25  (°C)2to  account  for  the  fact  that  we  know  that  there  is  a  bias  on  the  order  of 
5  °C  in  our  initial  state  estimate. 

Our  goal  in  this  simulation  is  to  improve  state  estimation  performance  by 
adapting  to  an  unknown  system  dynamics  noise  strength;  we  assume  (however  imper¬ 
fectly)  that  the  other  system  parameters  and  noise  statistics  are  completely  known, 
with  the  exception  that  we  consider  what  happens  when  we  fail  to  set  the  initial 
state  covariance,  Po,  properly.  If  the  filter-assumed  initial  state  covariance  is  too 
small  for  the  assumed  Xq  in  the  true  scenario  versus  the  filter,  then  the  filter  will 
not  properly  adjust  the  gain  and  will  weight  the  initial  measurements  too  lightly. 
On  the  other  hand,  when  Pq  is  too  large,  the  filter  becomes  too  responsive  to  initial 
measurements  and  disregards  the  system  dynamics  which  allow  us  to  propagate  the 
state  estimate  since  the  last  measurement  update.  So,  when  Q  is  accurately  esti¬ 
mated  by  the  MMAE  (and  we  have  properly  set  the  initial  state  covariance),  then 
we  obtain  state  estimation  performance  approaching  that  of  a  single  Kalman  filter 
with  artificial  knowledge  of  the  correct  parameter  values. 

The  truth  model  parameters  are  given  in  Table  5.1.  Some  of  the  design  param¬ 
eters  for  the  three  elemental  filters  of  the  MMAE  are  included  in  Table  5.6.  Note 
that  Qtme  is  the  true  value  of  the  dynamics  noise  strength  as  listed  in  Table  5.1. 
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Case 

Filter  Q  R  x0  P0 

1 

1  1  n  -  1 

J-  100Vtrue  20 

2  Qtrue  =  5  Ptme  25  P0poor  =  l 

3  100Qtme  =  500 

2 

1  TOO^true  —  20 

2  Qtrue  =  5  Ptme  25  P0good  =  25 

3  lOOQtme  =  500 

3 

1  1  n  -  1 

2  10Qtrue  =  50  Ptme  25  Ppoor  =  l 

3  1000QtrUe  =  5000 

4 

1  1  n  —  1 

10  true  2 

2  lOQtme  =  50  Ptme  25  Pgood  =  25 

3  lOOOQtme  =  5000 

Table  5.6  Simulation  2:  Elemental  filter  parameters  for  the  initial  state  error  co- 
variance  experiment.  Cases  1  and  2  feature  a  filter  bank  centered  on  the  true  value 
of  Q,  while  Cases  3  and  4  are  for  an  arrangement  that  overestimates  the  true  value 
of  Q  (centered  on  10Qtrue  rather  than  <5tme  itself).  Note  that  the  assumed  x0  value 
is  5  °C  greater  than  the  true  x0  of  20  °C. 

Since  the  true  value  for  Xq  is  20  °C  versus  the  25  °C  assumed  for  the  filters  in  this 
simulation,  a  poor  choice  for  the  initial  state  covariance  Po  would  be  Pq001  =  1, 
while  a  good  choice  would  be  P|ood  =  25.  The  poor  choice,  Pq  001  =  1,  reflects  a  poor 
assessment  of  the  initial  state  estimate  bias:  the  actual  value  of  x$  is  at  a  5a  point 
according  to  the  filter-assumed  Po.  An  undersized  initial  state  covariance  degrades 
the  filter’s  ability  to  converge  on  a  good  estimate  because  it  essentially  directs  the 
filter  to  underweight  the  measurement  updates  by  keeping  the  gain  low.  Thus,  the 
responsiveness  of  the  elemental  filter  is  inhibited  by  a  poor  choice  for  the  initial  state 
covariance.  The  filter  can  generally  recover  from  this  error,  however,  it  takes  several 
propagate/update  cycles  for  this  state  error  to  settle  out.  A  good  choice  for  the 
initial  state  covariance  (Pg°od  =  25)  accurately  portrays  the  size  of  the  initial  state 
estimate  bias. 
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Figure  5.12  Simulation  2:  System  dynamics  noise  strength.  Legend:  O  elemental 
filters  for  Cases  1  and  2;  B  elemental  filters  for  Cases  3  and  4;  ★  true  parameter 
within  the  filter  bank.  (The  filter  spacing  is  nonlinear  for  illustration  purposes.) 

The  relative  spacing  (on  a  log  scale)  for  the  assumed  dynamics  noise  strength 
for  the  two  sets  (four  cases)  of  three  elemental  Liters  is  displayed  in  Figure  5.12. 
Additionally,  successive  members  of  the  list  of  three  filters  are  separated  by  two 
orders  of  magnitude  in  Q,  have  the  same  R,  and  the  same  initial  temperature  bias  and 
state  covariance  for  each  case.  The  large  separation  is  clue  to  the  indistinguishability 
of  filters  for  closely  spaced  values  of  Q  as  shown  in  the  previous  experiment.  Since 
the  effects  of  the  dynamics  noise  strength  are  ascertained  using  measurements,  level 
of  the  measurement-corruption  noise  covariance  has  an  impact  on  how  well  we  can 
estimate  Q.  Consequently,  for  better  measurements  (“smaller”  R ),  we  can  get  a 
better  fix  on  Q. 

As  expected,  the  elemental  filter  designed  for  a  model  that  “slightly”  overesti¬ 
mates  the  true  noise  strength,  <3tme,  matches  the  real  world  the  best,  as  indicated  by 
its  high  mean  hypothesis  conditional  probability,  p2,  relative  to  the  other  elemental 
Liters  in  the  Liter  bank  ( pi  and  p3);  compare  Figure  5.13(a)  to  (c)  for  the  poor  P0 
models  and  Figure  5.13(b)  to  (d)  for  the  good  Pq  models  when  the  Q/R  ratio  is 
near  unity  as  seen  previously  in  simulation  one.  Recall  that  the  hypothesis  condi¬ 
tional  probability,  pk,  increases  as  the  sequence  of  residuals  have  a  Liter-computed 
covariance  that  is  most  in  consonance  with  the  actual  covariance  of  the  measure¬ 
ment  residuals'.  In  this  example,  elemental  Liter  2  is  the  most  properly  modeled 
Liter  and  its  probability  tends  towards  one  the  strongest  in  Case  3,  as  seen  in  Figure 

7It  has  been  shown  [94,  129]  that  the  sequence  of  residuals  {r^  (ti)}  resulting  from  linear  filtering 
in  additive  noise  forms  a  zero-mean  white  Gaussian  sequence  with  known  residual  covariance  A fc(L:)- 
Thus,  if  a  filter  model  matches  the  “true”  system,  then  the  residual  r  j,(L)  will  be  a  zero-mean  white 
Gaussian  process  with  known  residual  covariance  A &(£,)• 
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Figure  5.13  Simulation  2:  Initial  state  covariance  experiment  —  hypothesis  con¬ 
ditional  probability  flow,  (a)  Case  1:  -P^oor,  Qtme,  lOOQtme}-  (b)  Case 

2:  P0good,  {^Qtn^Qtrue^OOQtrue}.  (c)  Case  3:  P0poor,  {^gtrue,10gtrue,1000gtme}. 
(d)  Case  4:  P|ood,  {p^gtrue,  lOgtrue,  1000gtrue}.  To  make  these  plots  clearer,  only  the 
mean  hypothesis  conditional  probabilities  for  times  {to, £2,  •  •  •  Pioo}  are  displayed. 


5.13(c),  while,  at  the  same  time,  the  other  elemental  filter  hypothesis  conditional 
probabilities  tend  toward  zero.  The  good  probability  flow  for  Case  3  is  due  to  a 
combination  of  circumstances.  An  elemental  filter  that  has  “slightly”  overestimated 
the  dynamics  noise  strength  can  oftentimes  compensate  for  an  inadequate  initializa¬ 
tion  of  the  state  covariance.  On  the  other  hand,  an  elemental  filter  with  too  low  of  an 
assumed  Q  is  quickly  recognized  by  the  MMAE  as  a  mismodeled  filter  since  its  rate 
of  convergence  is  not  increased  by  a  large  assumption  for  Q.  While  a  poor  setting 
for  Pq  degrades  state  estimation  performance,  this  may  have  a  beneficial  impact  on 
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parameter  estimation  when  the  best  elemental  filter  design  is  one  that  assumes  a 
slightly  too  large  value  for  the  parameter  of  interest,  i.e.,  for  the  case  when  we  do 
not  have  an  elemental  filter  based  on  the  true  parameter  value  —  this  research  did 
not  address  this  possible  “enhancement”  any  further. 

The  relatively  poor  distinguishability  of  the  trio  of  filters  in  plots  (a)  and  (b) 
of  Figure  5.13  is  due,  in  part,  to  the  low  Q/R  ratio  —  this  effect  was  seen  previously 
in  the  far  left  column  of  plots  in  Figure  5.5.  The  probability  flow  in  Case  2  is  better 
than  for  Case  1  because  of  a  better  assumption  for  P0.  The  poor  P0  in  Case  1  serves 
to  obscure  the  best  elemental  filter  initially  because  the  too  large  assumption  for 
Q  combined  with  the  too  small  Pq  for  elemental  filter  3  yields  what  appears  to  the 
MMAE  as  a  good  assessment  of  the  error  covariance;  however,  elemental  filter  2 
eventually  absorbs  the  probability  initially  given  to  elemental  filter  3  as  it  “flushes” 
out  the  poor  initialization  of  the  error  covariance  given  by  Pq. 

On  the  other  hand,  the  elemental  filters  shown  in  plots  (c)  and  (d)  of  Figure 
5.13,  are  more  distinguishable  because  the  best  match  is  a  filter  based  on  an  overes¬ 
timate  for  the  dynamics  noise  strength;  the  far  right  column  of  plots  in  Figure  5.5  is 
a  good  example  of  how  the  overestimate  of  the  noise  strength  can  lead  to  increased 
distinguishability  between  filters. 

Now  let’s  look  a  little  closer  at  general  trends  evident  for  elemental  filter 
1.  When  the  (simulated)  real- world  noise  strength  exceeds  the  hypothesized  noise 
strength,  the  filter’s  residuals  look  bad  and  consequently,  the  likelihood  quotient 
rf(ti)  A1(1(ti)  r1(tj)  grows  larger;  Equation  (5.6)  informs  us  that  we  should  expect 
the  likelihood  quotient,  P{l_i(fj)}  =  tr{A^1(fi)  Atrue (£*)},  to  grow  larger  as  the  true 
residual  covariance  “increases”  relative  to  the  assumed  filter-computed  residual  co- 
variance.  Therefore,  the  probability  that  the  elemental  filter  is  based  on  a  good 
model  which  accurately  reflects  the  real-world,  decreases.  In  all  four  plots  of  Figure 
5.13,  elemental  filter  1  is  clearly  not  based  on  the  best  model;  this  is  most  evident 
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in  plots  (a)  and  (c),  while  in  plots  (b)  and  (d),  the  higher  quality,  “good”,  estimate 
for  Po  somewhat  masks  the  much  too  low  value  for  Q. 

When  the  filter  bank  is  centered  on  the  true  dynamics  noise  strength,  see  plots 
(a)  and  (b)  in  Figure  5.13,  the  low  Q/R  ratio  generally  produces  marginally  distin¬ 
guishable  filters.  Additionally,  the  quality  of  the  initial  state  covariance  estimate, 
Po,  strongly  influences  the  initial  performance  of  the  filters.  The  quality  of  the  initial 
estimate  determines  how  long  it  takes  the  filter  to  recover  from  a  poor  estimate.  As 
the  state  covariance  converges  to  its  true  value,  the  second  elemental  filter  absorbs 
the  majority  of  the  probability.  At  the  1-second  mark,  elemental  filter  2  for  both 
cases,  shown  in  plots  (a)  and  (b),  gain  roughly  three  fifths  of  the  probably.  In  plot 
(a),  the  third  elemental  filter  initially  dominates  because  the  poor  (underestimate 
of)  P0  nicely  balances  the  overestimate  of  the  dynamics  noise  strength.  This  unfor¬ 
tunate  effect  can  be  seen  clearly  by  rewriting  the  filter-computed  residual  covariance: 
A k(ti)  =  Hfc(ij)  Pfc(t“)  Hj(tj)  +  R k(ti)  using  the  expression  for  the  propagated  state 
covariance  to  give  (without  the  k  subscripts) 


A (U)  =  H(ti)[&(ti,ti-i)P(tt_ J  3>r(M;_i)  +  Qd(ti_i)]HT(ti)  +  R (tj)  (5.11) 

In  plot  (b),  the  low  Q/R  ratio  combined  with  a  good  Po  estimate  create  marginally 
distinguishable  filters,  which  eventually  become  more  distinguishable  and  the  prop¬ 
erly  modeled  second  elemental  filter  gains  the  majority  of  the  probability. 

In  plots  (c)  and  (d)  of  Figure  5.13,  we  show  the  results  for  the  case  when 
the  filter  bank  does  not  contain  an  elemental  filter  that  matches  the  true  values 
assumed  by  the  simulation.  In  this  case,  the  elemental  filter  which  looks  the  best, 
i.e. ,  the  elemental  filter  with  filter-computed  residual  covariance  that  is  most  in 
consonance  with  the  true  residual  covariance,  is  the  one  which  slightly  overestimates 
the  dynamics  noise  strength.  To  illustrate,  consider  three  elemental  filters,  numbered 
1,  2,  and  3,  based  on  higher  Q  values  as  the  index  number  increases.  For  elemental 
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filter  1  based  on  a  too  low  Q  value,  the  likelihood  quotient  rf(dj)  r-^fj)  will 

grow  much  larger  than  m  (because  the  measurement  residuals,  r  1(ti),  are  much  larger 
than  anticipated).  On  the  other  hand,  for  elemental  filter  3  based  on  too  large  a  Q 
value,  rj(fj)  Aj1(ti)  r3(U)  won’t  be  as  large  because  the  filter-computed  covariance 
A 3(fj)  is  so  much  larger.  If  the  true  value  Atrue(0)  increases  over  time  (in  a  ramp 
fashion,  for  example),  then  when  Atme(fj)  =  A 2(fj),  elemental  filter  2  should  absorb 
most  of  the  probability.  However,  when  Atrue(f,;)  gets  larger  than  A 2(fj)  (even  by 
a  small  amount),  r^(fj)  A^1(fi)  rk(ti)  can  become  significantly  larger  than  m,  with 
the  result  that  the  probability  flows  to  the  elemental  filter  3,  even  if  A 3(tj)  is  much 
larger  than  Atrue(£i)- 

If  a  chosen  discretization  is  so  coarse,  that  this  phenomenon  causes  estima¬ 
tion  problems,  then  use  a  finer  discretization  —  for  a  finer  discretization,  the  “next 
higher”  elemental  filter  that  absorbs  the  probability  will  not  be  based  on  a  much- 
too-high  value  of  Q  or  R.  Furthermore,  specifically  consider  a  moving-bank  MMAE 
(discussed  in  Section  2.6)  to  allow  for  a  fine  discretization  without  the  burden  of 
populating  the  filter  bank  with  an  excessively  large  number  of  elemental  filters  that 
results  from  the  requirement  to  cover  the  entire  range  of  possible  Q  or  R  values. 

While  elemental  filter  3  is  initially  favored  in  Case  1,  as  shown  in  plot  (a), 
it  is  rejected  the  quickest  in  Case  4,  see  plot  (d),  as  compared  to  the  other  cases, 
because  (1)  the  too  large  assumption  of  the  dynamics  noise  strength  is  not  obscured 
by  a  poor  initial  state  covariance  estimate  as  in  Case  3  and,  (2)  compared  to  Case  2, 
elemental  filter  3  assumes  a  much  too  large  value  for  Q.  The  key  to  good  performance 
is  to  have  an  elemental  filter  based  on  a  model  that  only  slightly  overestimates  the 
noise  strength;  when  the  assumed  Q  value  is  significantly  too  high,  the  filter  becomes 
overly  responsive  to  the  measurements  and  then  the  subset  of  filters  which  feature 
overestimated  noise  strengths  become  less  distinguishable. 

When  comparing  the  probability  gathered  by  elemental  filter  1  in  plots  (c) 
and  (d),  we  see  that  elemental  filter  1  is  rejected  more  quickly  in  Case  3.  The  poor 
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initialization  of  the  error  covariance,  P0>  artificially  helps  us  to  see  that  the  too  low 
assumption  for  Q  is  completely  inadequate  to  model  the  error  covariance  given  the 
measurements  taken.  On  the  other  hand,  the  more  appropriate  P0  assigned  in  Case  4 
does  not  give  the  too  low  Q  assumption  that  extra  boost  to  enhance  the  probability 
flow  away  from  elemental  filter  1. 

Now  that  we  have  compared  and  contrasted  many  aspects  of  these  four  cases, 
we  shall  look  at  the  individual  cases  more  closely.  The  remainder  of  this  section  is 
comprised  of  alternating  discussions  and  compilations  of  figures  which  apply  to  the 
specific  cases. 

Case  1:  A  poor  initial  state  covariance  and  a  set  of  parameter  values  for  the 
bank  of  elemental  filters  that  is  centered  on  the  true  dynamics  noise  strength.  A 
close  inspection  of  Figure  5.14(a  -  f)  for  elemental  filter  1  shows  that  the  state 
estimate  is  slowly  converging  towards  the  true  state  —  slowly  because  we  have  told 
the  filter  that  the  dynamics  model  is  very  good  (much  better  than  it  truly  is)  and 
thus  not  to  put  too  much  trust  in  the  measurements.  Additionally,  this  slowness 
is  exacerbated  by  the  poor  choice  for  the  initial  state  covariance,  as  can  be  easily 
demonstrated  by  comparison  to  the  Case  2  results,  i.e.,  see  Figure  5.18(a  -  f).  In 
particular,  look  at  time  zero  in  Figure  5.14(d,  f)  to  observe  that  the  state  estimate 
mean  error  (solid  line)  and  the  mean  plus  and  minus  one  sigma  (dash-dot  line)  values 
are  outside  the  zero  plus  and  minus  one  filter-computed  sigma  bounds  (gray  dashed 
line)  initially  created  by  the  initial  state  covariance.  For  proper  operation,  the  filter 
must  operate  inside  of  these  bounds.  To  see  how  the  estimate  progresses  along  the 
entire  rod,  see  Figure  5.14(i  -  p)  on  the  continuation  page  of  the  figure;  these  plots 
also  list  the  RMS  error  for  each  displayed  time  instant.  By  comparison,  for  elemental 
filter  2  in  Figure  5.15(a  -  f),  we  see  that  the  state  estimate  converges  more  quickly 
because  we  have  a  larger  and  more  realistic  assumed  value  of  the  noise  strength. 
Continuing  this  theme  for  elemental  filter  3,  we  see  in  Figure  5.16(a  -  f)  that  it 
converges  very  rapidly  and  then  overshoots  and  continually  overreacts  to  the  noise 
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entering  through  the  measurements.  This  can  be  quantified  by  examining  the  Q/R 
ratio,  which  determines  the  filter  gain,  K .  As  Q/R  increases,  so  does  the  Kalman 
gain,  which  results  in  a  wider  filter  bandwidth  —  faster  responsiveness.  On  the  other 
hand,  lower  Q/R  produces  a  smaller  Kalman  gain  and  thus  the  system  must  wait  for 
measurements  to  make  substantial  changes  in  its  estimates.  Additionally,  elemental 
filter  3  absorbs  the  bulk  of  the  probability  early  on  because  its  higher  assumed  Q 
allows  it  to  compensate  for  the  poor  choice  for  PQ. 
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Figure  5.14  Simulation  2,  Case  1  (P^oor):  Elemental  Filter  1.  (a)  Rod  temperature 
at  p  =  0  m.  (b)  Error  at  p  =  0  m.  (c)  Rod  temperature  at  p  —  0.5  m.  (d)  Error  at 
p  =  0.5  m.  (e)  Rod  temperature  at  p  =  1  m.  (f)  Error  at  p  =  1  m.  (g)  Likelihood 
quotient,  (h)  Hypothesis  conditional  probability. 
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Figure  5.14  Simulation  2,  Case  1  (P0poor):  Elemental  Filter  1  (cont’d).  (i)  Rod 
temperature  at  ti  =  0  sec.  (j)  Rod  temperature  at  C  =  0.14  sec.  (k)  Rod  temperature 
at  ti  =  0.29  sec.  (1)  Rod  temperature  at  U  =  0.43  sec.  (m)  Rod  temperature 
at  ti  =  0.57  sec.  (n)  Rod  temperature  at  ti  =  0.71  sec.  (o)  Rod  temperature  at 
ti  =  0.86  sec.  (p)  Rod  temperature  at  C  =  1.00  sec. 
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Figure  5.15  Simulation  2,  Case  1  (P^oor):  Elemental  Filter  2.  (a)  Rod  temperature 
at  p  =  0  m.  (b)  Error  at  p  =  0  m.  (c)  Rod  temperature  at  p  =  0.5  m.  (d)  Error  at 
p  =  0.5  m.  (e)  Rod  temperature  at  p  =  1  m.  (f)  Error  at  p  =  1  m.  (g)  Likelihood 
quotient,  (h)  Hypothesis  conditional  probability. 
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Figure  5.15  Simulation  2,  Case  1  (P0poor):  Elemental  Filter  2  (cont’d).  (i)  Rod 
temperature  at  £*  =  0  sec.  (j)  Rod  temperature  at  C  =  0.14  sec.  (k)  Rod  temperature 
at  ti  =  0.29  sec.  (1)  Rod  temperature  at  U  =  0.43  sec.  (m)  Rod  temperature 
at  ti  =  0.57  sec.  (n)  Rod  temperature  at  ti  =  0.71  sec.  (o)  Rod  temperature  at 
ti  =  0.86  sec.  (p)  Rod  temperature  at  C  =  1.00  sec. 
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Figure  5.16  Simulation  2,  Case  1  (P^oor):  Elemental  Filter  3.  (a)  Rod  temperature 
at  p  =  0  m.  (b)  Error  at  p  =  0  m.  (c)  Rod  temperature  at  p  =  0.5  m.  (d)  Error  at 
p  =  0.5  m.  (e)  Rod  temperature  at  p  =  1  m.  (f)  Error  at  p  =  1  m.  (g)  Likelihood 
quotient,  (h)  Hypothesis  conditional  probability. 
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Figure  5.16  Simulation  2,  Case  1  (P0poor):  Elemental  Filter  3  (cont’d).  (i)  Rod 
temperature  at  ti  =  0  sec.  (j)  Rod  temperature  at  C  =  0.14  sec.  (k)  Rod  temperature 
at  ti  =  0.29  sec.  (1)  Rod  temperature  at  U  =  0.43  sec.  (m)  Rod  temperature 
at  ti  =  0.57  sec.  (n)  Rod  temperature  at  ti  =  0.71  sec.  (o)  Rod  temperature  at 
ti  =  0.86  sec.  (p)  Rod  temperature  at  C  =  1.00  sec. 
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Figure  5.17  Simulation  2,  Case  1  (Pq001):  Blended  Filter,  (a)  Rod  temperature  at 
p  =  0  m.  (b)  Error  at  p  =  0  m.  (c)  Rod  temperature  at  p  =  0.5  m.  (d)  Error  at 
p  =  0.5  m.  (e)  Rod  temperature  at  p  =  1  m.  (f)  Error  at  p  =  1  m.  (g)  Rod  RMS 
temperature  error. 
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Figure  5.17  Simulation  2,  Case  1  (P0poor):  Blended  Filter  (cont’d).  (h)  Rod  tem¬ 
perature  at  ti  =  0  sec.  (i)  Rod  temperature  at  =  0.14  sec.  (j)  Rod  temperature  at 
ti  =  0.29  sec.  (k)  Rod  temperature  at  ti  =  0.43  sec.  (1)  Rod  temperature  at  ti  =  0.57 
sec.  (m)  Rod  temperature  at  tt  =  0.71  sec.  (n)  Rod  temperature  at  ti  =  0.86  sec. 
(o)  Rod  temperature  at  ti  =  1.00  sec. 
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Case  2:  A  good  initial  state  covariance  and  a  set  of  parameter  values  for  the 
bank  of  elemental  filters  that  is  centered  on  the  true  dynamics  noise  strength.  The 
trends  from  Case  1  generally  hold  here,  except  now  that  we  have  an  honest  appraisal 
of  the  initial  state  covariance,  the  filter  can  properly  respond  to  the  initial  state 
estimate  bias  that  it  finds.  We  note  in  particular  that  the  filters  are  now  operating 
within  the  zero  plus  and  minus  one  sigma  bounds  (gray  dashed  line)  created  by 
the  initial  state  covariance  as  viewed  in  Figure  5.18(d,  f).  For  proper  operation,  the 
filter  must  operate  inside  of  these  bounds.  The  results  are  dramatically  different  from 
those  of  Case  1.  In  Figure  5.18(a  -  f),  we  see  that  the  state  estimate  converges  very 
rapidly,  even  though  the  elemental  filter  (number  1)  overestimates  the  quality  of  the 
dynamics  model.  In  fact,  the  convergence  is  so  swift,  that  all  three  elemental  filters 
for  this  case  compute  an  acceptable  state  estimate  in  just  a  portion  of  the  simulated 
time  period.  The  RMS  error  is  reduced  by  about  90%  for  the  first  two  elemental 
filters  by  the  second  time  slice,  as  seen  in  Figures  5 . 1 8 ( j )  and  5 . 2 0 ( j ) .  As  in  Case  1, 
elemental  filter  3,  [Figure  5.20(a  -  f)]  converges  very  rapidly  and  then  overshoots  and 
continually  overreacts  to  the  noise  entering  through  the  measurements  —  the  initial 
covariance  estimate  appears  to  have  little  bearing  on  a  filter  based  on  a  model  with 
“high”  Q.  A  high  assumed  value  for  Q  “flushes”  the  initial  conditions  out  of  the 
system  because  the  gain  is  high  when  Q  is  high,  i.e.,  low  confidence  in  the  dynamics 
model  or  simply  a  large  amount  of  process  noise. 
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Figure  5.18  Simulation  2,  Case  2  (psood):  Elemental  Filter  1.  (a)  Rod  temperature 
at  p  =  0  m.  (b)  Error  at  p  =  0  m.  (c)  Rod  temperature  at  p  —  0.5  m.  (d)  Error  at 
p  =  0.5  m.  (e)  Rod  temperature  at  p  =  1  m.  (f)  Error  at  p  =  1  m.  (g)  Likelihood 
quotient,  (h)  Hypothesis  conditional  probability. 
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Figure  5.18  Simulation  2,  Case  2  (P0good):  Elemental  Filter  1  (cont’d).  (i)  Rod 
temperature  at  ti  =  0  sec.  (j)  Rod  temperature  at  tt  =  0.14  sec.  (k)  Rod  temperature 
at  ti  =  0.29  sec.  (1)  Rod  temperature  at  U  =  0.43  sec.  (m)  Rod  temperature 
at  ti  =  0.57  sec.  (n)  Rod  temperature  at  ti  =  0.71  sec.  (o)  Rod  temperature  at 
ti  =  0.86  sec.  (p)  Rod  temperature  at  ti  =  1.00  sec. 
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Figure  5.19  Simulation  2,  Case  2  (ps°od);  Elemental  Filter  2.  (a)  Rod  temperature 
at  p  =  0  m.  (b)  Error  at  p  =  0  m.  (c)  Rod  temperature  at  p  =  0.5  m.  (d)  Error  at 
p  =  0.5  m.  (e)  Rod  temperature  at  p  =  1  m.  (f)  Error  at  p  =  1  m.  (g)  Likelihood 
quotient,  (h)  Hypothesis  conditional  probability. 
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Figure  5.19  Simulation  2,  Case  2  (P0good):  Elemental  Filter  2  (cont’d).  (i)  Rod 
temperature  at  £*  =  0  sec.  (j)  Rod  temperature  at  tt  =  0.14  sec.  (k)  Rod  temperature 
at  ti  =  0.29  sec.  (1)  Rod  temperature  at  U  =  0.43  sec.  (m)  Rod  temperature 
at  ti  =  0.57  sec.  (n)  Rod  temperature  at  ti  =  0.71  sec.  (o)  Rod  temperature  at 
ti  =  0.86  sec.  (p)  Rod  temperature  at  ti  =  1.00  sec. 
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Figure  5.20  Simulation  2,  Case  2  (ps°od);  Elemental  Filter  3.  (a)  Rod  temperature 
at  p  =  0  m.  (b)  Error  at  p  =  0  m.  (c)  Rod  temperature  at  p  =  0.5  m.  (d)  Error  at 
p  =  0.5  m.  (e)  Rod  temperature  at  p  =  1  m.  (f)  Error  at  p  =  1  m.  (g)  Likelihood 
quotient,  (h)  Hypothesis  conditional  probability. 
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Figure  5.20  Simulation  2,  Case  2  (P0good):  Elemental  Filter  3  (cont’d).  (i)  Rod 
temperature  at  £*  =  0  sec.  (j)  Rod  temperature  at  tj  =  0.14  sec.  (k)  Rod  temperature 
at  ti  =  0.29  sec.  (1)  Rod  temperature  at  U  =  0.43  sec.  (m)  Rod  temperature 
at  ti  =  0.57  sec.  (n)  Rod  temperature  at  ti  =  0.71  sec.  (o)  Rod  temperature  at 
ti  =  0.86  sec.  (p)  Rod  temperature  at  ti  =  1.00  sec. 
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Figure  5.21  Simulation  2,  Case  2  (PQ°od):  Blended  Filter, 
at  p  =  0  m.  (b)  Error  at  p  =  0  m.  (c)  Rod  temperature  at  p 
p  =  0.5  m.  (e)  Rod  temperature  at  p  =  1  m.  (f)  Error  at  p 
temperature  error. 
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Figure  5.21  Simulation  2,  Case  2  (P0good):  Blended  Filter  (cont’d).  (h)  Rod  tem¬ 
perature  at  ti  =  0  sec.  (i)  Rod  temperature  at  =  0.14  sec.  (j)  Rod  temperature  at 
ti  =  0.29  sec.  (k)  Rod  temperature  at  ti  =  0.43  sec.  (1)  Rod  temperature  at  ti  =  0.57 
sec.  (m)  Rod  temperature  at  tt  =  0.71  sec.  (n)  Rod  temperature  at  ti  =  0.86  sec. 
(o)  Rod  temperature  at  ti  =  1.00  sec. 
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Case  3:  A  poor  initial  state  covariance  and  a  set  of  parameter  values  for  the 
bank  of  elemental  filters  that  is  centered  at  ten  times  the  true  dynamics  noise  strength. 
A  close  inspection  of  Figure  5.22(a  -  f)  for  elemental  filter  1  shows  that  the  state 
estimate  is  slowly  converging  to  the  true  state  just  as  it  did  in  Case  1,  Figure  5.14. 
Again,  we  observe  at  time  zero  in  Figure  5.22(d,  f)  that  the  state  estimate  mean 
error  (solid  line)  and  mean  plus  and  minus  one  sigma  (dash-dot  line)  are  outside  the 
zero  plus  and  minus  one  filter-computed  sigma  bounds  (gray  dashed  line)  created  by 
the  initial  state  covariance.  The  estimate  progresses  along  the  entire  rod  as  shown 
in  Figure  5.22(i  -  p).  By  comparison,  for  elemental  filter  2  in  Figure  5.23(a  -  f), 
we  see  that  the  state  estimate  converges  more  quickly  than  for  elemental  filter  2  in 
Figure  5.15(a  -  f)  because  we  have  a  larger  assumed  value  for  the  dynamics  noise 
strength.  As  we  have  previously  noted,  a  larger  filter-assumed  Q,  for  a  given  R , 
gives  rise  to  a  larger  gain  and  is  thus  more  responsive  to  the  measurements.  The 
model  for  elemental  filter  3  greatly  overstates  the  true  dynamics  noise  strength,  see 
Figure  5.24(a  -  f )] ,  and  it  converges  very  rapidly  and  then  overshoots  and  continually 
overreacts  to  the  noise  entering  through  the  measurements;  as  before,  the  initial 
covariance  estimate  appears  to  have  little  bearing  on  the  performance  of  a  filter 
based  on  this  model,  compared  to  the  ill  match  provided  by  overstating  Qtrue.  When 
the  filter’s  Q  is  much  too  high,  the  transient  caused  by  the  initial  conditions  is  very 
short  and  then  filter  performance  relies  heavily  on  the  measurements.  All  of  these 
attributes  contribute  to  driving  the  elemental  filter  2  mean  hypothesis  conditional 
probability  towards  one. 
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Figure  5.22  Simulation  2,  Case  3  (P^oor):  Elemental  Filter  1.  (a)  Rod  temperature 
at  p  =  0  m.  (b)  Error  at  p  =  0  m.  (c)  Rod  temperature  at  p  =  0.5  m.  (d)  Error  at 
p  =  0.5  m.  (e)  Rod  temperature  at  p  =  1  m.  (f)  Error  at  p  =  1  m.  (g)  Likelihood 
quotient,  (h)  Hypothesis  conditional  probability. 
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Figure  5.22  Simulation  2,  Case  3  (P0poor):  Elemental  Filter  1  (cont’d).  (i)  Rod 
temperature  at  tt  —  0  sec.  (j)  Rod  temperature  at  tj  =  0.14  sec.  (k)  Rod  temperature 
at  ti  =  0.29  sec.  (1)  Rod  temperature  at  U  =  0.43  sec.  (m)  Rod  temperature 
at  ti  =  0.57  sec.  (n)  Rod  temperature  at  tt  =  0.71  sec.  (o)  Rod  temperature  at 
ti  =  0.86  sec.  (p)  Rod  temperature  at  ti  =  1.00  sec. 
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Figure  5.23  Simulation  2,  Case  3  (P^oor):  Elemental  Filter  2.  (a)  Rod  temperature 
at  p  =  0  m.  (b)  Error  at  p  =  0  m.  (c)  Rod  temperature  at  p  =  0.5  m.  (d)  Error  at 
p  =  0.5  m.  (e)  Rod  temperature  at  p  =  1  m.  (f)  Error  at  p  =  1  m.  (g)  Likelihood 
quotient,  (h)  Hypothesis  conditional  probability. 
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Figure  5.23  Simulation  2,  Case  3  (P0poor):  Elemental  Filter  2  (cont’d).  (i)  Rod 
temperature  at  £*  =  0  sec.  (j)  Rod  temperature  at  tj  =  0.14  sec.  (k)  Rod  temperature 
at  ti  =  0.29  sec.  (1)  Rod  temperature  at  U  =  0.43  sec.  (m)  Rod  temperature 
at  ti  =  0.57  sec.  (n)  Rod  temperature  at  ti  =  0.71  sec.  (o)  Rod  temperature  at 
ti  =  0.86  sec.  (p)  Rod  temperature  at  ti  =  1.00  sec. 
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Figure  5.24  Simulation  2,  Case  3  (P^oor):  Elemental  Filter  3.  (a)  Rod  temperature 
at  p  =  0  m.  (b)  Error  at  p  =  0  m.  (c)  Rod  temperature  at  p  =  0.5  m.  (d)  Error  at 
p  =  0.5  m.  (e)  Rod  temperature  at  p  =  1  m.  (f)  Error  at  p  =  1  m.  (g)  Likelihood 
quotient,  (h)  Hypothesis  conditional  probability. 
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Figure  5.24  Simulation  2,  Case  3  (P0poor):  Elemental  Filter  3  (cont’d).  (i)  Rod 
temperature  at  ti  =  0  sec.  (j)  Rod  temperature  at  ti  =  0.14  sec.  (k)  Rod  temperature 
at  ti  =  0.29  sec.  (1)  Rod  temperature  at  U  =  0.43  sec.  (m)  Rod  temperature 
at  ti  =  0.57  sec.  (n)  Rod  temperature  at  ti  =  0.71  sec.  (o)  Rod  temperature  at 
ti  =  0.86  sec.  (p)  Rod  temperature  at  ti  =  1.00  sec. 
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Figure  5.25  Simulation  2,  Case  3  (Pq001):  Blended  Filter,  (a)  Rod  temperature  at 
p  =  0  m.  (b)  Error  at  p  =  0  m.  (c)  Rod  temperature  at  p  =  0.5  m.  (d)  Error  at 
p  =  0.5  m.  (e)  Rod  temperature  at  p  =  1  m.  (f)  Error  at  p  =  1  m.  (g)  Rod  RMS 
temperature  error. 
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Figure  5.25  Simulation  2,  Case  3  (P0poor):  Blended  Filter  (cont’d).  (h)  Rod  tem¬ 
perature  at  ti  =  0  sec.  (i)  Rod  temperature  at  =  0.14  sec.  (j)  Rod  temperature  at 
ti  =  0.29  sec.  (k)  Rod  temperature  at  ti  =  0.43  sec.  (1)  Rod  temperature  at  ti  =  0.57 
sec.  (m)  Rod  temperature  at  tt  =  0.71  sec.  (n)  Rod  temperature  at  ti  =  0.86  sec. 
(o)  Rod  temperature  at  ti  =  1.00  sec. 
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Case  4-'  A  good,  initial  state  covariance  and  a  set  of  parameter  values  for  the 
bank  of  elemental  filters  that  is  centered  at  ten  times  the  true  dynamics  noise  strength. 
Now  that  we  have  a  good  initial  state  covariance,  elemental  filter  1  appears  to  be 
a  better  match  relative  to  the  second  elemental  filter  because  its  filter-assumed  low 
(optimistic)  assessment  of  the  dynamics  noise  strength  is  tempered  by  a  good  as¬ 
sumed  value  for  the  initial  state  covariance  as  contrasted  by  Case  3  where  we  had 
a  poor  choice  for  P0.  The  poor  (too  small)  choice  for  P0  in  Case  3  made  it  easier 
for  the  MMAE  to  recognize  the  first  elemental  filter  with  a  too-low  value  for  Q  (and 
thus  slow  dynamic  response  for  the  corresponding  elemental  filter)  was  based  on  the 
wrong  model;  hence  the  low  hypothesis  conditional  probability.  To  see  the  interplay 
of  the  initial  state  covariance  and  the  assumed  dynamics  noise  strength  more  clearly, 
we  turn  to  Equation  (5.11),  which  shows  us  that  the  updated  state  covariance,  seeded 
by  P0,  is  “scaled”  by  the  (“square”  of  the)  state  transition  matrix  and  then  added 
to  the  discrete-time  version  of  the  dynamics  noise  strength,  Qd  to  accomplish  time 
propagation.  Thus,  the  strength  of  the  noise  for  the  first  filter  appears  higher  and 
thus  it  often  accounts  for  the  variance  in  the  system  and  thus  is  an  attractive  choice 
for  the  MMAE.  However,  state  estimation  performance  would  be  degraded  if  the 
initial  state  covariance  were  purposefully  set  lower  than  the  true  value. 

When  an  elemental  filter  is  based  on  an  exceedingly  high  assumed  value  for  Q 
(e.g.,  three  orders  of  magnitude  greater  than  Qtrue),  the  MMAE  quickly  recognizes 
the  mismodeled  filter.  This  behavior  can  be  seen  readily  by  comparing  Cases  3  and 
4  to  Cases  1  and  2  in  which  the  third  elemental  filter  is  more  conservatively  set  at 
just  two  orders  of  magnitude  greater  than  the  true  value.  Since  the  measurement 
covariance,  R ,  is  assumed  known  and  is  used  to  create  all  of  the  hlters  in  the  bank, 
the  corresponding  gain  in  these  filters  is  relatively  high  and  thus  the  hlters  attempt 
to  follow  the  measurements.  This  situation  creates  a  mismatch  between  the  filter- 
assumed  error  covariance  and  the  covariance  of  the  residuals,  as  can  be  observed  in 
the  likelihood  quotient  histories  for  the  third  elemental  filter  in  all  four  cases  (but 
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more  substantially  for  Cases  3  and  4)  in  Figures  5.16(g),  5.20(g),  5.24(g),  and  5.28(g). 
When  a  poor  choice  is  made  for  P0,  we  have  seen  that  the  MMAE  can  more  quickly 
flow  probability  away  from  elemental  filters  with  underestimated  Q.  Conversely,  a 
poor  choice  for  P0  compounds  the  difficulty  of  identifying  an  elemental  filter  based  on 
an  assumed  Q  that  is  too  high  —  compare  the  elemental  filter  3  probability  histories 
for  Cases  1  and  3  versus  Cases  2  and  4. 
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Figure  5.26  Simulation  2,  Case  4  (psood):  Elemental  Filter  1.  (a)  Rod  temperature 
at  p  =  0  m.  (b)  Error  at  p  =  0  m.  (c)  Rod  temperature  at  p  —  0.5  m.  (d)  Error  at 
p  =  0.5  m.  (e)  Rod  temperature  at  p  =  1  m.  (f)  Error  at  p  =  1  m.  (g)  Likelihood 
quotient,  (h)  Hypothesis  conditional  probability. 
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Figure  5.26  Simulation  2,  Case  4  (P0good):  Elemental  Filter  1  (cont’d).  (i)  Rod 
temperature  at  ti  =  0  sec.  (j)  Rod  temperature  at  tt  =  0.14  sec.  (k)  Rod  temperature 
at  ti  =  0.29  sec.  (1)  Rod  temperature  at  U  =  0.43  sec.  (m)  Rod  temperature 
at  ti  =  0.57  sec.  (n)  Rod  temperature  at  ti  =  0.71  sec.  (o)  Rod  temperature  at 
ti  =  0.86  sec.  (p)  Rod  temperature  at  ti  =  1.00  sec. 
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Figure  5.27  Simulation  2,  Case  4  (ps°od);  Elemental  Filter  2.  (a)  Rod  temperature 
at  p  =  0  m.  (b)  Error  at  p  =  0  m.  (c)  Rod  temperature  at  p  =  0.5  m.  (d)  Error  at 
p  =  0.5  m.  (e)  Rod  temperature  at  p  =  1  m.  (f)  Error  at  p  =  1  m.  (g)  Likelihood 
quotient,  (h)  Hypothesis  conditional  probability. 


5-66 


26 


24 

22 

20 

18 

26 

24 

22 

20 

18 

26 

24 

22 

20 

18 

26 

24 

22 
20  : 
18 


RMS  error  =  5  deg  C 


0 

0.2 

0.4  0.6 

position  (m) 

(i) 

0.8 

1 

RMS  error  =  0.13  deg  C 


0 

0.2 

0.4  0.6  0.8 

position  (m) 

(k) 

1 

RMS  error  =  0.2  deg  C 


0  0.2 

0.4  0.6  0.8 

position  (m) 

(m) 

26 

24 

22 

20 

18 


26 

24 

22 

20 

18 


RMS  error  =  0.59  deg  C 


~  — 

- - — 

0 

0.2 

0.4  0.6 

0.8 

position  (m) 

(j) 

0 

0.2  0.4  0.6  0.8 

position  (m) 

(1) 

RMS  error  =  0.13  deg  C 

0 

0.2 

0.4  0.6 

position  (m) 

(n) 

0.8 

26 

RMS  error  =  0.24  deg  C 

_  24 
o 

o> 

G> 

H.  22 

Q. 

E 

RMS  error  =  0.25  deg  C 

20 

18 

0.2  0.4  0.6  0.8 

position  (m) 

(o) 


0.2 


0.4  0.6 

position  (m) 
(P) 


0.8 


Figure  5.27  Simulation  2,  Case  4  (P0good):  Elemental  Filter  2  (cont’d).  (i)  Rod 
temperature  at  ti  =  0  sec.  (j)  Rod  temperature  at  tt  =  0.14  sec.  (k)  Rod  temperature 
at  ti  =  0.29  sec.  (1)  Rod  temperature  at  U  =  0.43  sec.  (m)  Rod  temperature 
at  ti  =  0.57  sec.  (n)  Rod  temperature  at  ti  =  0.71  sec.  (o)  Rod  temperature  at 
ti  =  0.86  sec.  (p)  Rod  temperature  at  ti  =  1.00  sec. 
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Figure  5.28  Simulation  2,  Case  4  (ps°od);  Elemental  Filter  3.  (a)  Rod  temperature 
at  p  =  0  m.  (b)  Error  at  p  =  0  m.  (c)  Rod  temperature  at  p  =  0.5  m.  (d)  Error  at 
p  =  0.5  m.  (e)  Rod  temperature  at  p  =  1  m.  (f)  Error  at  p  =  1  m.  (g)  Likelihood 
quotient,  (h)  Hypothesis  conditional  probability. 
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Figure  5.28  Simulation  2,  Case  4  (P0good):  Elemental  Filter  3  (cont’d).  (i)  Rod 
temperature  at  ti  =  0  sec.  (j)  Rod  temperature  at  tt  =  0.14  sec.  (k)  Rod  temperature 
at  ti  =  0.29  sec.  (1)  Rod  temperature  at  U  =  0.43  sec.  (m)  Rod  temperature 
at  ti  =  0.57  sec.  (n)  Rod  temperature  at  ti  =  0.71  sec.  (o)  Rod  temperature  at 
ti  =  0.86  sec.  (p)  Rod  temperature  at  ti  =  1.00  sec. 
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Figure  5.29  Simulation  2,  Case  4  (P|ood):  Blended  Filter,  (a)  Rod  temperature 
at  p  =  0  m.  (b)  Error  at  p  =  0  m.  (c)  Rod  temperature  at  p  =  0.5  m.  (d)  Error  at 
p  =  0.5  m.  (e)  Rod  temperature  at  p  =  1  m.  (f)  Error  at  p  =  1  m.  (g)  Rod  RMS 
temperature  error. 
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Figure  5.29  Simulation  2,  Case  4  (P0good):  Blended  Filter  (cont’d).  (h)  Rod  tem¬ 
perature  at  ti  =  0  sec.  (i)  Rod  temperature  at  =  0.14  sec.  (j)  Rod  temperature  at 
ti  =  0.29  sec.  (k)  Rod  temperature  at  ti  =  0.43  sec.  (1)  Rod  temperature  at  ti  =  0.57 
sec.  (m)  Rod  temperature  at  tt  =  0.71  sec.  (n)  Rod  temperature  at  ti  =  0.86  sec. 
(o)  Rod  temperature  at  ti  =  1.00  sec. 
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Blended  estimates  for  all  four  cases.  Since  we  have  designed  these  experiments 
with  full  knowledge  of  what  the  truth  is,  the  blended  filter  results  are  often  inferior 
to  the  filter  designed  to  match  the  simulated  truth  model  fully.  That  is  true  here, 
as  can  be  seen  by  comparing  the  figures  for  the  second  elemental  filters  to  those  of 
the  blended  filters.  For  example,  compare  Figures  5.19  and  5.21  or  Figures  5.27  and 
5.29.  The  blended  result  is  biased  because  every  elemental  filter  contributes  to  the 
blended  solution.  This  bias  is  attributed  to  the  fact  that  we  have  set  a  minimum 
threshold  for  the  hypothesis  conditional  probabilities8  and  thus,  even  filters  based 
on  completely  mismatched  models  receive  a  nonzero  weighting.  We  could  of  course 
choose  to  blend  only  those  elemental  filter  estimates  which  exceed  this  threshold; 
that  adjustment  was  not  pursued  in  this  research. 

Finally,  we  draw  your  attention  to  another  consequence  of  improperly  setting 
the  initial  state  covariance.  Compare  blended  plots  (h)  for  the  Case  1  in  Figure  5.17 
and  Case  2  in  Figure  5.21,  or  for  an  even  better  visual,  compare  blended  plots  (h)  for 
the  Case  3  in  Figure  5.25  and  Case  4  in  Figure  5.29.  Note  that  for  P,food  =  25,  the 
RMS  error  has  been  reduced  from  5  to  only  2.5  after  just  one  measurement  update; 
whereas  for  P0p°O1  =  1,  the  RMS  error  has  only  been  marginally  reduced  by  about 
5%.  This  is  completely  attributable  to  the  choices  for  the  initial  state  covariance. 
(As  a  side  note,  we  can  also  determine  the  number  of  sensor  segments  used  to  gather 
the  temperature  data  from  the  rod  by  noting  the  four  transitions  in  Figures  5.21(h) 
and  5.29(h).) 

5-4  Simulation  3 

While  the  first  simulation  demonstrated  how  we  could  obtain  an  order- of- 
magnitude  estimate  of  Q  using  an  MMAE  filter  bank  populated  with  five  ele¬ 
mental  filters,  this  simulation  shows  how  we  can  get  an  accurate  estimate  of  the 
measurement-corruption  noise  covariance,  R,  using  another  bank  of  five  elemental 

8See  Section  2.4.6  for  a  discussion  on  lower  bounding  the  hypothesis  conditional  probabilities. 
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Case 

Filter  Q  R  x0  P0 

1 

1  ^Rtrue  —  0.2 

2  iRtrne  =  1 

3  Q  true  -Rtrue  5  25  25 

4  5  Rtrue  =  25 

5  25 Rtrue  =  125 

2 

1  Rtrue  =  0.4 

2  |  .Rtrue  =  2 

3  Qtrue  2i?true  =  10  25  25 

4  10i?true  =  50 

5  50.Rtrue  =  250 

Table  5.7  Simulation  3:  Elemental  Filter  Parameters 

- - - - 

0.2  0.4  1  2  5  10  25  50  125  250  R 

Figure  5.30  Simulation  3:  Measurement-corruption  noise  covariance.  Legend:  O 
elemental  filters  for  Case  1;  □  elemental  filters  for  Case  2;  ★  true  parameter  within 
the  filter  bank.  (The  filter  spacing  is  nonlinear,  but  appears  linear  for  illustration 
purposes.) 

filters.  The  first  simulation  demonstrated  that  the  MMAE  had  trouble  distinguish¬ 
ing  among  “closely”  spaced  filters  in  terms  of  the  Q  parameter  —  the  best  results 
were  obtained  for  a  discretization  level  of  100.  Discretization  of  the  R  parameter 
does  not  suffer  from  this  malady  and  hence  the  focus  shifts  to  having  enough  filters 
to  cover  the  range  of  possibilities.  Table  5.7  presents  the  pertinent  elemental  filter 
design  parameters  for  two  cases  of  interest.  In  the  first  case,  we  place  the  “center” 
filter  at  what  we  know  is  the  value  for  i?true  and  in  the  second  case,  we  allow  for  a 
slight  upward  offset  from  this  position.  Figure  5.30  gives  a  graphical  display  of  the 
separation  between  the  elemental  filters  for  both  cases  together  on  a  single  axis. 

Note  that  the  assumed  R  values  for  the  five  filters  are  separated  from  their 
nearest  neighbors  by  just  a  factor  of  five  (versus  one  to  two  orders  of  magnitude  for 
Q  in  the  previous  simulation).  Even  though  we  saw  in  Simulation  2  that  the  best 
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filter  tends  to  overestimate  the  true  dynamics  noise  strength  for  “fine”  discretization 
levels  or  for  low  Q/R  ratio,  in  this  simulation,  discrimination  of  the  five  elemental 
filters  did  not  depend  strongly  on  the  Q  used.  We  obtained  essentially  the  same 
results  for  2 Q  and  5 Q,  while  at  10Q  the  state  estimation  performance  started  to 
degrade.  However,  measurement  noise  covariance  estimation  was  still  good.  Only 
the  results  for  truth  Q  are  shown  in  the  following  figures  for  the  filter  bank  centered 
on  true  R  and  the  filter  bank  offset  from  Ptrue. 

As  we  saw  in  the  previous  simulation,  a  poor  choice  for  P0  can  have  disastrous 
effects  on  the  initial  response  of  the  system  to  new  measurements.  When  Pq  is  too 
small,  the  response  of  the  filter  to  new  measurements  is  slowed.  This  effect  is  much 
the  same  as  overestimating  the  true  R.  Thus,  only  the  results  for  an  adequate  initial 
state  covariance  is  given. 

As  expected,  the  hypothesis  conditional  probability  for  elemental  filter  3  tends 
towards  one  in  both  filter  banks,  as  shown  in  Figure  5.31  while  the  probability  for  the 
other  elemental  filters  tends  towards  zero,  but  at  a  faster  rate  than  seen  in  Simulation 
1.  The  centered  filter  bank  is  perhaps  unrealistic  for  a  real  world  situation,  but  as  we 
can  see  here,  the  results  for  Case  2  in  which  the  true  measurement-corruption  noise 
covariance  occurs  between  the  assumed  values  of  two  filters  in  the  bank  is  still  quite 
good  as  shown  in  Figure  5.31(b).  These  trends  were  very  similar  for  the  other  three 
values  of  Q  tested;  thus  we  primarily  discuss  the  case  in  which  we  know  what  the 
true  value  of  Q  is.  Additionally,  simulation  1  showed  us  that  we  need  only  a  good 
guess  to  achieve  good  results  since  varying  Q  slightly  had  no  appreciable  effects  on 
performance. 

The  results  for  Case  1  shown  in  plot  (a)  of  Figure  5.31  are  to  be  expected 
since  the  third  filter  was  designed  using  the  foreknowledge  of  the  truth  model.  Sim¬ 
ilarly,  we  anticipate  that  the  two  elemental  filters  closest  to  the  true  value  of  the 
noise  covariance  (an  underestimate  and  an  overestimate)  would  receive  “all”  of  the 
probability  flow  for  Case  2,  as  shown  in  plot  (b).  As  we  saw  in  simulation  1,  a 
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Figure  5.31  Simulation  3:  Hypothesis  conditional  probability  flow,  (a)  Case 
1  filter  bank:  {^i?true,  i?true,  5i?tme,  25i?tme}-  (b)  Case  2  filter  bank: 

{^-Rtrue,  f-Rtrue,  2-RtrUe,  10i?tme,  50i?true}-  To  make  these  plots  more  legible,  only  the 
mean  hypothesis  conditional  probabilities  for  times  {f0,  £2,  . . . ,  Hoo}  are  displayed. 


filter  that  slightly  overestimates  the  expected  covariance  is  favored  over  one  that  un¬ 
derestimates  the  covariance.  A  filter  based  on  an  underestimate  for  the  covariance 
has  its  assumptions  violated  on  a  regular  basis  since  the  real  world  noise  covariance 
exceeds  that  which  was  programmed  into  the  elemental  filter.  Thus,  the  third  el¬ 
emental  filter  receives  the  bulk  of  the  probability  while  the  second  elemental  filter 
often  underestimates  the  true  covariance,  it  is  correct  often  enough  so  that  it  receives 
a  small  (but  noteworthy)  share  of  the  probability.  Note  that  if  we  try  to  use  an  even 
larger  overestimate  for  the  measurement-corruption  noise  covariance,  then  we  end 
up  with  an  elemental  filter  that  “thinks”  that  the  measurements  are  so  sloppy  as 
to  be  relatively  worthless  as  compared  to  the  quality  of  the  dynamics  model.  This 
effect  can  be  seen  clearly  in  the  probability  calculation  (given  in  Section  2. 3. 3. 3)  - 
that  is,  the  probability  that  the  hypothesis  is  correct  is  inversely  proportional  to  the 
square-root  of  the  determinant  of  the  measurement-corruption  noise  covariance,  i.e. , 
an  excessively  large  covariance  can  yield  a  very  small  probability  due  to  this  scaling 
term.  For  a  reasonable  overestimate,  this  scaling  is  balanced  by  the  exponential 
portion  of  the  probability  density  function. 
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Case  1:  Adapting  to  an  unknown  i?trUe  using  a  filter  bank  centered  on  Rt me. 
It  comes  as  no  surprise  that,  when  we  match  an  elemental-filter-assumed  parameter 
value  to  the  real-world  parameter  value,  the  MMAE  performs  very  well.  We  begin 
our  analysis  by  observing  the  probability  flow  in  Figure  5.31(a).  Then,  we  investigate 
the  (h)  plots  for  the  individual  elemental  filters  for  this  case  in  Figures  5.32  through 
5.36  to  see  how  much  the  hypothesis  conditional  probabilities  varied  over  the  course 
of  the  simulation.  For  this  case,  only  elemental  filter  3  seems  to  have  been  based 
on  a  good  hypothesis  of  the  true  R ;  this  is  not  surprising.  If  we  look  at  plot  (g)  in 
conjunction  with  plot  (h),  we  can  see  that  the  likelihood  quotient  gives  a  reliable 
account  of  which  filter  matches  the  simulated  world  the  best.  When  Rmtei  is  smaller 
than  -Rtrue,  the  likelihood  quotient  is  greater  than  the  expected  M  =  5  that  is 
exhibited  by  a  filter  for  the  best  hypothesis.  As  the  assumed  value  for  R  increases, 
the  likelihood  quotient  decreases.  As  shown  in  the  development  presented  in  the 
introduction  section  for  this  chapter,  the  likelihood  quotient  is  roughly  equal  to  a 
ratio  of  the  true  R  to  the  assumed  R  for  the  elemental  filter.  Hence,  the  results  we 
have  just  seen  in  plots  (g)  are  fairly  predictable.  As  mentioned  above,  convergence 
is  dictated  by  the  Q/R  ratio  (which  gives  rise  to  the  steady  state  gain  K);  hence 
the  elemental  filters  with  the  smallest  assumed  R  converged  the  quickest  to  the  rod 
temperature. 

Continuing  our  analysis  for  this  case,  we  see  that  when  the  assumed  R  is  sig¬ 
nificantly  smaller  than  the  true  noise  covariance,  the  dynamics  model  is  essentially 
cast  aside,  i.e.,  the  Kalman  gain  is  very  high.  To  support  this  assertion  empirically, 
we  inspect  the  behavior  displayed  in  plots  (a)  through  (f)  in  Figure  5.32  and  to  a 
lesser  degree  in  Figure  5.33  since  the  assumed  R  is  closer  to  the  true  R  in  the  latter. 
Elemental  filter  3  represents  a  balance  between  the  dynamics  and  measurement  mod¬ 
els  and  is  also  the  best  elemental  filter  in  the  bank.  The  fourth  and  fifth  elemental 
filters  behave  in  the  opposite  manner,  they  “trust”  the  dynamics  model  more  than 
they  should  and  place  less  emphasis  on  the  new  measurements,  which  can  be  seen 
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by  the  slowly  converging  state  estimates  on  Figure  5.35  and  to  a  greater  degree  on 
Figure  5.36. 
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Figure  5.32  Simulation  3,  Case  1:  Elemental  Filter  1.  (a)  Rod  temperature  at 
p  =  0  m.  (b)  Error  at  p  =  0  m.  (c)  Rod  temperature  at  p  =  0.5  m.  (d)  Error  at 
p  =  0.5  m.  (e)  Rod  temperature  at  p  =  1  m.  (f)  Error  at  p  =  1  m.  (g)  Likelihood 
quotient,  (h)  Hypothesis  conditional  probability. 
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Figure  5.32  Simulation  3,  Case  1:  Elemental  Filter  1  (cont’d).  (i)  Rod  temperature 
at  ti  —  0  sec.  (j)  Rod  temperature  at  U  =  0.14  sec.  (k)  Rod  temperature  at  t,t  =  0.29 
sec.  (1)  Rod  temperature  at  U  =  0.43  sec.  (m)  Rod  temperature  at  ti  =  0.57  sec. 
(n)  Rod  temperature  at  ti  =  0.71  sec.  (o)  Rod  temperature  at  ti  =  0.86  sec.  (p)  Rod 
temperature  at  ti  =  1.00  sec. 
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Figure  5.33  Simulation  3,  Case  1:  Elemental  Filter  2.  (a)  Rod  temperature  at 
p  =  0  m.  (b)  Error  at  p  =  0  m.  (c)  Rod  temperature  at  p  =  0.5  nr.  (d)  Error  at 
p  =  0.5  m.  (e)  Rod  temperature  at  p  =  1  m.  (f)  Error  at  p  =  1  m.  (g)  Likelihood 
quotient,  (h)  Hypothesis  conditional  probability. 
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Figure  5.33  Simulation  3,  Case  1:  Elemental  Filter  2  (cont’d).  (i)  Rod  temperature 
at  t{  —  0  sec.  (j)  Rod  temperature  at  U  =  0.14  sec.  (k)  Rod  temperature  at  ti  =  0.29 
sec.  (1)  Rod  temperature  at  U  =  0.43  sec.  (m)  Rod  temperature  at  ti  =  0.57  sec. 
(n)  Rod  temperature  at  ti  =  0.71  sec.  (o)  Rod  temperature  at  ti  =  0.86  sec.  (p)  Rod 
temperature  at  ti  =  1.00  sec. 
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Figure  5.34  Simulation  3,  Case  1:  Elemental  Filter  3.  (a)  Rod  temperature  at 
p  =  0  m.  (b)  Error  at  p  =  0  m.  (c)  Rod  temperature  at  p  =  0.5  m.  (d)  Error  at 
p  =  0.5  m.  (e)  Rod  temperature  at  p  =  1  m.  (f)  Error  at  p  =  1  m.  (g)  Likelihood 
quotient,  (h)  Hypothesis  conditional  probability. 
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Figure  5.34  Simulation  3,  Case  1:  Elemental  Filter  3  (cont’d).  (i)  Rod  temperature 
at  t{  —  0  sec.  (j)  Rod  temperature  at  U  =  0.14  sec.  (k)  Rod  temperature  at  ti  =  0.29 
sec.  (1)  Rod  temperature  at  U  =  0.43  sec.  (m)  Rod  temperature  at  ti  =  0.57  sec. 
(n)  Rod  temperature  at  ti  =  0.71  sec.  (o)  Rod  temperature  at  ti  =  0.86  sec.  (p)  Rod 
temperature  at  ti  =  1.00  sec. 
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Figure  5.35  Simulation  3,  Case  1:  Elemental  Filter  4.  (a)  Rod  temperature  at 
p  =  0  m.  (b)  Error  at  p  =  0  m.  (c)  Rod  temperature  at  p  =  0.5  m.  (d)  Error  at 
p  =  0.5  m.  (e)  Rod  temperature  at  p  =  1  m.  (f)  Error  at  p  =  1  m.  (g)  Likelihood 
quotient,  (h)  Hypothesis  conditional  probability. 
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Figure  5.35  Simulation  3,  Case  1:  Elemental  Filter  4  (cont’d).  (i)  Rod  temperature 
at  t{  —  0  sec.  (j)  Rod  temperature  at  U  =  0.14  sec.  (k)  Rod  temperature  at  ti  =  0.29 
sec.  (1)  Rod  temperature  at  U  =  0.43  sec.  (m)  Rod  temperature  at  ti  =  0.57  sec. 
(n)  Rod  temperature  at  ti  =  0.71  sec.  (o)  Rod  temperature  at  ti  =  0.86  sec.  (p)  Rod 
temperature  at  ti  =  1.00  sec. 


5-85 


time  (s) 

(a) 


(c) 


0.5 


(e) 


0.4  i * -l 


0.3 

0.2 

0.1 

0 


(1 


VXv  %  Av'v  '-N  v  ','r 


f.  ^  ,  .  A 

'i  V  _ 


0.2  0.4  0.6  0.8 

time  (s) 

(g) 


(b) 


(d) 


0.2 


0.4  0.6 

time  (s) 

(h) 


0.8 


Figure  5.36  Simulation  3,  Case  1:  Elemental  Filter  5.  (a)  Rod  temperature  at 
p  =  0  m.  (b)  Error  at  p  =  0  m.  (c)  Rod  temperature  at  p  =  0.5  m.  (d)  Error  at 
p  =  0.5  m.  (e)  Rod  temperature  at  p  =  1  m.  (f)  Error  at  p  =  1  m.  (g)  Likelihood 
quotient,  (h)  Hypothesis  conditional  probability. 
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Figure  5.36  Simulation  3,  Case  1:  Elemental  Filter  5  (cont’d).  (i)  Rod  temperature 
at  t{  —  0  sec.  (j)  Rod  temperature  at  U  =  0.14  sec.  (k)  Rod  temperature  at  ti  =  0.29 
sec.  (1)  Rod  temperature  at  U  =  0.43  sec.  (m)  Rod  temperature  at  ti  =  0.57  sec. 
(n)  Rod  temperature  at  ti  =  0.71  sec.  (o)  Rod  temperature  at  ti  =  0.86  sec.  (p)  Rod 
temperature  at  ti  =  1.00  sec. 
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Figure  5.37  Simulation  3,  Case  1:  Blended  Filter,  (a)  Rod  temperature  at  p  =  0 
m.  (b)  Error  at  p  =  0  m.  (c)  Rod  temperature  at  p  =  0.5  m.  (d)  Error  at  p  =  0.5  m. 
(e)  Rod  temperature  at  p  =  1  m.  (f)  Error  at  p  =  1  m.  (g)  Rod  RMS  temperature 
error. 
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Figure  5.37  Simulation  3,  Case  1:  Blended  Filter  (cont’d).  (h)  Rod  temperature 
at  U  =  0  sec.  (i)  Rod  temperature  at  tt  =  0.14  sec.  (j)  Rod  temperature  at  t,t  =  0.29 
sec.  (k)  Rod  temperature  at  t*  =  0.43  sec.  (1)  Rod  temperature  at  ti  =  0.57  sec. 
(m)  Rod  temperature  at  ti  =  0.71  sec.  (n)  Rod  temperature  at  t*  =  0.86  sec.  (o)  Rod 
temperature  at  ti  =  1.00  sec. 
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Case  2:  Adapting  to  an  unknown  i?true  using  a  filter  bank  purposefully  offset 
from  Rtrue-  As  expected,  the  second  and  third  elemental  filters  were  deemed  the  most 
likely  as  seen  in  Figure  5.31(b).  The  hypotheses  conditional  probability  histories  in 
plot  (h)  of  Figures  5.38  through  5.42,  show  that  the  probabilities  for  elemental  filters 
1,  4,  and  5  are  nearly  zero  in  every  Monte  Carlo  run,  while  the  relative  share  of  the 
probability  varied  considerably  between  elemental  filters  2  and  3  as  indicated  by 
the  rather  large  standard  deviation  evident  in  the  mean  plus  and  minus  one  sigma 
(dashed)  lines  on  plots  (h).  Recall  that,  for  each  Monte  Carlo  run,  a  sample  of  the 
measurement-corruption  noise  process  v  is  drawn.  (Over  the  course  of  the  fifty  runs, 
these  samples  are  representative  of  a  white  Gaussian  noise  processes  with  covariance 
R.)  So,  for  a  particular  simulation  run,  if  R  is  high  —  where  R  is  the  repeated 
eigenvalue  of  the  matrix  R  —  then  the  MMAE  would  more  heavily  favor  elemental 
filter  3,  and  if  R  is  low,  it  more  closely  matched  elemental  filter  2.  As  anticipated, 
the  mean  probability  flow  to  elemental  filter  3  is  greater  than  to  elemental  filter  2. 
An  examination  of  plot  (g)  reveals  that  these  two  elemental  filters  are  in  consonance 
with  the  real-world  conditions  since  their  likelihood  quotients  are  near  M  =  5;  the 
likelihood  quotient  for  elemental  filter  3  (i?3  =  2Rtrue)  is  less  than  Eve  (about  two  and 
a  half),  while  the  one  for  elemental  filter  2  ( R2  =  0.4i?true)  was  too  large  (around 
12).  The  rest  of  the  elemental  filters  were  poorly  matched  to  the  simulated  real 
world  and  thus  received  essentially  zero  probability  for  the  entire  simulation,  and 
their  likelihood  quotients  in  plot  (g)  were  either  too  large  (for  filter  1),  or  too  small 
(for  filters  4  and  5).  Note  that  the  RMS  error  at  time  f *  =  1  sec  is  lowest  for  blended 
filter  output  [Figure  5.43)  (o)],  versus  any  of  the  elemental  filters  alone,  even  though 
none  of  the  elemental  filters  were  based  on  a  correct  value  for  R.  The  blended  Elter 
output  is  due  almost  entirely  to  an  effective  blending  of  elemental  Elters  2  and  3. 
Hence,  in  a  real-world  environment,  where  we  might  have  only  incomplete  and/or 
low  quality  information  on  the  true  noise  environment,  the  blended  estimates  may 
prove  to  be  the  most  useful. 
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A  filial  comment.  While  it  was  entirely  anticipated  that  elemental  filter  3 
would  absorb  all  of  the  probability  in  Case  1,  it  is  useful  to  note  that  even  when 
the  assumed  value  is  slightly  increased  (in  this  case  it  was  doubled),  the  elemental 
filter  still  matches  quite  well,  as  demonstrated  in  Case  2.  Additionally,  the  elemental 
filter  that  “resides  on  the  other  side”  of  the  true  R  gathered  the  remainder  of  the 
probability,  as  previously  noted. 
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Figure  5.38  Simulation  3,  Case  2:  Elemental  Filter  1.  (a)  Rod  temperature  at 
p  =  0  m.  (b)  Error  at  p  =  0  m.  (c)  Rod  temperature  at  p  =  0.5  m.  (d)  Error  at 
p  =  0.5  m.  (e)  Rod  temperature  at  p  =  1  m.  (f)  Error  at  p  =  1  m.  (g)  Likelihood 
quotient,  (h)  Hypothesis  conditional  probability. 


5-92 


26 


24 

22 

20 

18 

26 

24 

22 
20  ; 
18 


RMS  error  =  5  deg  C 


RMS  error  =  0.22  deg  C 


0.2  0.4  0.6  0.8 

position  (m) 

(k) 


0.2  0.4  0.6  0.8 

position  (m) 

(o) 


0 

0.2 

0.4  0.6 

position  (m) 

(i) 

0.8 

1 

0 

0.2 

0.4  0.6 

position  (m) 

(m) 

0.8 

1 

26 

24 

22 

20 

18 


(j) 


RMS  error  =  0.16  deg  C 


18 


0  0.2 

0.4  0.6 

position  (m) 

(1) 

0.8 

0  0.2 

0.4  0.6 

position  (m) 

(n) 

0.8 

0.4  0.6 

position  (m) 
(P) 


Figure  5.38  Simulation  3,  Case  2:  Elemental  Filter  1  (cont’d).  (i)  Rod  temperature 
at  ti  —  0  sec.  (j)  Rod  temperature  at  U  =  0.14  sec.  (k)  Rod  temperature  at  t,t  =  0.29 
sec.  (1)  Rod  temperature  at  U  =  0.43  sec.  (m)  Rod  temperature  at  ti  =  0.57  sec. 
(n)  Rod  temperature  at  ti  =  0.71  sec.  (o)  Rod  temperature  at  ti  =  0.86  sec.  (p)  Rod 
temperature  at  ti  =  1.00  sec. 
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Figure  5.39  Simulation  3,  Case  2:  Elemental  Filter  2.  (a)  Rod  temperature  at 
p  =  0  m.  (b)  Error  at  p  =  0  m.  (c)  Rod  temperature  at  p  =  0.5  nr.  (d)  Error  at 
p  =  0.5  m.  (e)  Rod  temperature  at  p  =  1  m.  (f)  Error  at  p  =  1  m.  (g)  Likelihood 
quotient,  (h)  Hypothesis  conditional  probability. 
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Figure  5.39  Simulation  3,  Case  2:  Elemental  Filter  2  (cont’d).  (i)  Rod  temperature 
at  ti  —  0  sec.  (j)  Rod  temperature  at  U  =  0.14  sec.  (k)  Rod  temperature  at  t,t  =  0.29 
sec.  (1)  Rod  temperature  at  U  =  0.43  sec.  (m)  Rod  temperature  at  ti  =  0.57  sec. 
(n)  Rod  temperature  at  ti  =  0.71  sec.  (o)  Rod  temperature  at  ti  =  0.86  sec.  (p)  Rod 
temperature  at  ti  =  1.00  sec. 
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Figure  5.40  Simulation  3,  Case  2:  Elemental  Filter  3.  (a)  Rod  temperature  at 
p  =  0  m.  (b)  Error  at  p  =  0  m.  (c)  Rod  temperature  at  p  =  0.5  m.  (d)  Error  at 
p  =  0.5  m.  (e)  Rod  temperature  at  p  =  1  m.  (f)  Error  at  p  =  1  m.  (g)  Likelihood 
quotient,  (h)  Hypothesis  conditional  probability. 
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Figure  5.40  Simulation  3,  Case  2:  Elemental  Filter  3  (cont’d).  (i)  Rod  temperature 
at  ti  —  0  sec.  (j)  Rod  temperature  at  U  =  0.14  sec.  (k)  Rod  temperature  at  t,t  =  0.29 
sec.  (1)  Rod  temperature  at  U  =  0.43  sec.  (m)  Rod  temperature  at  ti  =  0.57  sec. 
(n)  Rod  temperature  at  ti  =  0.71  sec.  (o)  Rod  temperature  at  ti  =  0.86  sec.  (p)  Rod 
temperature  at  ti  =  1.00  sec. 
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Figure  5.41  Simulation  3,  Case  2:  Elemental  Filter  4.  (a)  Rod  temperature  at 
p  =  0  m.  (b)  Error  at  p  =  0  m.  (c)  Rod  temperature  at  p  =  0.5  m.  (d)  Error  at 
p  =  0.5  m.  (e)  Rod  temperature  at  p  =  1  m.  (f)  Error  at  p  =  1  m.  (g)  Likelihood 
quotient,  (h)  Hypothesis  conditional  probability. 
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Figure  5.41  Simulation  3,  Case  2:  Elemental  Filter  4  (cont’d).  (i)  Rod  temperature 
at  ti  —  0  sec.  (j)  Rod  temperature  at  U  =  0.14  sec.  (k)  Rod  temperature  at  t,t  =  0.29 
sec.  (1)  Rod  temperature  at  U  =  0.43  sec.  (m)  Rod  temperature  at  ti  =  0.57  sec. 
(n)  Rod  temperature  at  ti  =  0.71  sec.  (o)  Rod  temperature  at  ti  =  0.86  sec.  (p)  Rod 
temperature  at  ti  =  1.00  sec. 
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Figure  5.42  Simulation  3,  Case  2:  Elemental  Filter  5.  (a)  Rod  temperature  at 
p  =  0  m.  (b)  Error  at  p  =  0  m.  (c)  Rod  temperature  at  p  =  0.5  nr.  (d)  Error  at 
p  =  0.5  m.  (e)  Rod  temperature  at  p  =  1  m.  (f)  Error  at  p  =  1  m.  (g)  Likelihood 
quotient,  (h)  Hypothesis  conditional  probability. 
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Figure  5.42  Simulation  3,  Case  2:  Elemental  Filter  5  (cont’d).  (i)  Rod  temperature 
at  ti  —  0  sec.  (j)  Rod  temperature  at  U  =  0.14  sec.  (k)  Rod  temperature  at  t,t  =  0.29 
sec.  (1)  Rod  temperature  at  U  =  0.43  sec.  (m)  Rod  temperature  at  ti  =  0.57  sec. 
(n)  Rod  temperature  at  ti  =  0.71  sec.  (o)  Rod  temperature  at  ti  =  0.86  sec.  (p)  Rod 
temperature  at  ti  =  1.00  sec. 
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Figure  5.43  Simulation  3,  Case  2:  Blended  Filter,  (a)  Rod  temperature  at  p  =  0 
m.  (b)  Error  at  p  =  0  m.  (c)  Rod  temperature  at  p  =  0.5  m.  (d)  Error  at  p  =  0.5  m. 
(e)  Rod  temperature  at  p  =  1  m.  (f)  Error  at  p  =  1  m.  (g)  Rod  RMS  temperature 
error. 
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Figure  5.43  Simulation  3,  Case  2:  Blended  Filter  (cont’d).  (h)  Rod  temperature 
at  U  =  0  sec.  (i)  Rod  temperature  at  tt  =  0.14  sec.  (j)  Rod  temperature  at  t,t  =  0.29 
sec.  (k)  Rod  temperature  at  tt  =  0.43  sec.  (1)  Rod  temperature  at  ti  =  0.57  sec. 
(m)  Rod  temperature  at  ti  =  0.71  sec.  (n)  Rod  temperature  at  t*  =  0.86  sec.  (o)  Rod 
temperature  at  ti  =  1.00  sec. 
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5 -Rmedian  50 
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Table  5.8  Simulation  4:  Elemental  Filter  Parameters 
5.5  Simulation  4 

The  preceding  simulation  showed  how  we  could  obtain  a  good  estimate  of  an 
unknown  constant  i?true.  This  simulation  demonstrates  the  capability  of  the  MMAE 
to  adapt  to  a  time- varying  Rt rue  which  varies  over  the  interval  [i?min,  -Rmax]  =  [1, 101]. 
Specifically,  we  seek  an  accurate  estimate  of  Rt me  as  it  either  linearly  increases 
or  decreases  during  the  one-second  interval  of  interest.  We  shall  use  a  bank  of 
five  elemental  filters  similar  to  those  used  in  Simulation  3  —  the  difference  lies  in 
the  R  value  used  to  center  the  filter  bank.  In  the  previous  simulation,  the  center 
elemental  filter  was  built  using  R  —  5,  whereas  in  this  simulation,  the  median  filter 
is  “located”  at  R  =  10  so  that  it  coincides  with  the  geometric  mean  of  the  minimum 
and  maximum  values  for  Rt rue.  Thus,  -Rmedian  =  V RmmRmax  —  \/l01  ~  10.  The 
relative  spacing  of  the  elemental  filters  is  the  same  factor  of  five  spacing,  as  are  the 
rest  of  the  elemental  filter  parameters.  The  elemental  filter  design  parameters  are 
tabulated  in  Table  5.8  for  convenience. 

We  shall  consider  two  distinct  cases  of  linearly  changing  Rt rue:  increasing  and 
decreasing.  The  initial  results  for  these  cases  are  shown  in  Figures  5.44  and  5.52, 
respectively.  The  MMAE  state  estimate  is  quite  good  for  both  cases;  the  MMAE 
state  estimate  is  off  by  less  than  1  °C  (RMS),  as  seen  in  Figures  5.51(o)  and  5.59(o). 
Additionally,  by  inspecting  Figures  5.44  and  5.52,  we  see  that  the  probability  flow 
among  the  elemental  filters  is  slightly  different,  i.e.,  the  figures  are  not  mirror  images 
of  each  other  as  we  would  expect  if  the  MMAE  handled  Rt rUe  increases  the  same  as 
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decreases.  An  increase  in  i?tme  creates  a  harsher  noise  environment  for  MMAE  adap¬ 
tation,  versus  a  decreasing  i?true,  which  presents  a  more  benign  noise  environment. 
“Harsh”  is  in  the  sense  that  the  measurements  are  less  precise;  “benign”  is  in  the 
sense  that  the  measurements  are  more  precise. 

Since  the  MMAE  “prefers”  a  filter  which  overestimates  the  true  measurement- 
corruption  noise  covariance  to  one  which  underestimates  the  true  covariance,  we 
should  anticipate  that  the  MMAE  “switches”  filters  more  quickly  for  the  increasing 
true  covariance  case.  Said  another  way,  increases  in  the  true  covariance  result  in  a 
more  aggressive  flow  of  probability  to  an  elemental  filter  based  on  a  larger  covariance. 
For  the  decreasing  true  covariance  case,  an  elemental  filter  based  on  a  too-large  value 
is  not  so  quickly  cast  aside  for  a  filter  based  on  a  more  realistic  model. 

5.5.1  Simulation  4 ,  Case  1  (Increasing  Rt vue).  For  this  first  case,  the 

true  measurement-corruption  noise  covariance,  i?true,  varies  linearly  for  times  tt  = 
{0,  0.01,  0.02, . . . ,  1}  according  to 

-^true(^i)  100(^  +  0.01)  (5.12) 

Thus  -Rtrue  begins  at  1  and  ends  at  101;  this  is  shown  graphically  by  the  dashed  line 
in  Figure  5.44,  as  seen  by  noting  the  scale  on  the  right-hand  side  of  the  plot.  Note 
that  the  elemental  filters  in  the  bank  completely  cover  this  range  of  values  for  i?true 
and  that  no  attempt  has  been  made  to  optimize  the  placement  of  these  filters. 

At  the  beginning  of  this  simulation,  .Rtrue  (H  =  0)  =  1  and  elemental  filters  1 
and  2  represent  the  best  hypotheses  since  they  have  R  values  of  0.4  and  2,  respec¬ 
tively;  see  Figure  5.44.  As  the  true  R  increases  to  2  at  the  next  time  step,  elemental 
filter  1  does  not  match  quite  as  well  and  elemental  filter  2  matches  perfectly,  how¬ 
ever,  it  takes  a  few  more  update  cycles  for  the  change  to  be  completely  “noticed” 
and  during  that  time,  the  true  value  has  continued  to  increase  and  thus  the  next 
elemental  filter  (number  3)  begins  to  become  a  better  fit.  Soon  after  elemental  hl- 
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Figure  5.44  Simulation  4,  Case  1  (increasing  /?,true):  Hypothesis  conditional  prob¬ 
ability. 

ter  3  absorbs  all  of  the  probability,  it  too  becomes  less  likely  as  the  true  R  starts 
to  “look”  like  the  next  filter  in  the  bank,  R±  =  50.  Note  that  only  two  elemental 
filters  register  an  appreciable  amount  of  probability  at  any  one  time,  thus  the  usual 
situation  is  one  in  which  an  elemental  filter  either  has  a  very  good  hypothesis,  or  the 
true  value  falls  between  the  hypothesized  values  for  neighboring  filters. 

The  hypothesis  conditional  probability  curves  for  each  elemental  filter  resemble 
trapezoids  which  begin  when  the  truth  R  is  about  one  half  of  the  filter-hypothesized 
R  and  that  elemental  filter  remains  the  most  likely  elemental  filter  until  the  halfway 
point  for  the  next  elemental  filter  in  the  bank,  as  can  be  seen  clearly  in  Figure  5.44. 
Therefore,  we  can  predict  that  elemental  filter  5  would  become  the  most  likely  filter 
at  time  1.25  seconds  if  we  were  to  run  the  simulation  for  that  length  of  time. 

The  second  column  in  Table  5.9  gives  the  range  of  l?true  values  for  the  best 
filter.  Note  that  the  ranges  are  mutually  exclusive.  In  the  fourth  column,  note  that 
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Filter 

R 

Rtrue  when  pk>  0.5 

-Rtrue  when  pk  >  0.1 

1 

0.4 

1 

1,2 

2 

2 

2,. ..,6 

1,  -  -  ■  ,9 

3 

10 

7,..., 26 

5,. ..,32 

4 

50 

27,...,  101 

19,...,  101 

5 

250 

— 

88, ... ,  101 

Tabic  5.9  Simulation  4,  Case  1  (increasing  i?true):  Best  hypothesis.  Filter  5  was 
never  the  best  filter  at  any  time  during  this  simulation;  hence  the  bar. 

only  two  filters  are  “in  force”  at  any  one  time  with  10%  or  more  likelihood.  We 
could  add  yet  another  column  corresponding  to  the  i?tme  values  when  the  standard 
deviation  in  the  mean  hypothesis  conditional  probability  history  is  about  zero;  see 
plot  (h)  of  Figures  5.46  through  5.50  at  the  end  of  this  section.  This  represents 
the  times  when  an  elemental  filter  matched  the  true  value  without  regard  to  the 
particular  realization  of  measurement  pseudonoise  added.  For  example,  elemental 
filter  3  is  the  best  when  i?true  =  10,  which  is  to  be  expected,  up  until  i?tme  =  14, 
which  is  a  fairly  small  window  given  the  neighboring  filter  is  at  R  =  50.  Elemental 
filter  4  has  a  much  wider  “perfect”  match  zone,  from  _Rtrue  =  40  until  i?true  =  64. 
Furthermore,  the  first  few  sample  periods  embody  the  usual  initial  transient  in  each 
of  the  elemental  filters  and  thus  these  first  few  time  instants  are  not  particularly 
indicative  of  the  true  capabilities  of  the  MMAE.  For  instance,  in  this  research,  all  of 
the  elemental  hlters  in  the  bank  are  assumed  to  be  equally  likely  when  the  simulation 
begins;  thus,  even  the  most  mismatched  hlter  gets  1/Jith  of  the  probability  at  the 
start. 

An  interesting  trend  that  we  can  easily  see  in  the  plot  (g)  likelihood  quotient 
histories  is  readily  explained  using  the  formula  developed  in  Section  5.1  for  the  steady 
state  likelihood  quotient  for  the  fctli  elemental  hlter  at  time  tt  (repeated  here  for  our 
convenience) 

£{4(*.)l<,.«„}  =  (5.10) 
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Figure  5.45  Simulation  4,  Case  1  (increasing  Rtr ue):  Likelihood  quotient.  This  plot 
has  been  clipped  at  L  =  20  since  only  the  elemental  Liters  operating  near  L  —  5 
absorb  an  appreciable  amount  of  probability. 


As  the  truth  value  for  R  increases,  so  do  the  likelihood  quotients  (which  of  course 
never  really  make  it  to  steady  state)  since  we  see  that  i?true  is  in  the  numerator,  while 
M / Rk  remains  unchanged.  Thus,  the  i?true  “ramp”  is  apparent  in  the  likelihood 
quotient  “ramps”  as  seen  in  Figure  5.45. 

In  general,  we  note  that  the  Liters  with  smaller  assumed  values  for  R  gave 
better  state  estimates,  because  in  the  beginning  when  i?tme  was  also  small,  we  had  the 
case  of  precise  measurements;  hence  estimation  is  very  good.  SpeciLcally,  elemental 
Liter  1  converged  rapidly  because  it  accurately  rebected  the  high  quality  of  the 
measurements;  see  Figure  5.46.  Recall  that  when  R  is  small,  the  (Q/R)  ratio  is 
large;  therefore  the  gain  is  large  and  that  gives  us  the  rapid  convergence.  However, 
as  the  measurement  quality  waned,  the  state  estimate  started  to  “wander”  and  its 
RMS  error  increased  as  shown  in  Figures  5.46  (i)  through  (p).  At  the  other  end  of 
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the  filter  bank,  elemental  filter  5  showed  a  relatively  large  RMS  error  [Figure  5.50 
(i)  through  (p)]  because  it  initially  assumed  that  the  measurements  were  “rough”, 
and  then  as  time  progressed,  the  measurements  truly  did  fall  in  quality. 

Overall,  the  MMAE  state  estimate  is  quite  good  given  the  fact  that  we  rarely 
have  an  elemental  filter  based  on  the  perfect  model.  Because  the  elemental  filters 
are  more  closely  spaced  at  small  values  of  R,  the  RMS  state  error  is  smaller  at  the 
beginning  (after  the  initial  transient)  of  the  simulation  as  seen  in  Figure  5.51(g)  for  all 
time  and  then  in  plots  (h)  through  (o)  (on  the  following  page)  along  the  length  of  the 
rod.  Of  course,  the  fact  that  a  small  truth  R  simply  means  that  the  measurements  are 
more  precise  contributes  to  our  excellent  performance  for  small  Since  we  knew 

a  priori  that  R  was  going  to  change  linearly,  perhaps  we  could  have  improved  the  state 
estimation  performance  by  spacing  the  elemental  filters  linearly.  An  inspection  of 
Figure  5.45  shows  that  the  likelihood  quotient  histories  are  all  roughly  linear.  Thus, 
the  quadratic  nature  of  the  likelihood  quotient,  Lk{U )  =  r£(ij)  A^1(ii)  rk(ti),  with 
respect  to  the  residuals  matches  up  well  with  a  linear  change  in  the  measurement- 
corruption  noise  covariance  and  thus  a  strictly  linear  spacing  of  the  elemental  filters 
would  have  given  an  unnecessary  concentration  of  filters  at  larger  values  of  R. 
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Figure  5.46  Simulation  4,  Case  1  (increasing  Rtrue):  Elemental  Filter  1.  (a)  Rod 
temperature  at  p  =  0  m.  (b)  Error  at  p  =  0  m.  (c)  Rod  temperature  at  p  —  0.5  m. 
(d)  Error  at  p  —  0.5  m.  (e)  Rod  temperature  at  p—1  m.  (f)  Error  at  p  =  1  m. 
(g)  Likelihood  quotient,  (h)  Hypothesis  conditional  probability. 
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Figure  5.46  Simulation  4,  Case  1  (increasing  i?true):  Elemental  Filter  1  (cont’d). 
(i)  Rod  temperature  at  tt  —  0  sec.  (j)  Rod  temperature  at  tt  =  0.14  sec.  (k)  Rod 
temperature  at  ti  =  0.29  sec.  (1)  Rod  temperature  at  ti  =  0.43  sec.  (m)  Rod 
temperature  at  ti  =  0.57  sec.  (n)  Rod  temperature  at  U  =  0.71  sec.  (o)  Rod 
temperature  at  ti  =  0.86  sec.  (p)  Rod  temperature  at  U  =  1.00  sec. 
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Figure  5.47  Simulation  4,  Case  1  (increasing  Rt rue):  Elemental  Filter  2.  (a)  Rod 
temperature  at  p  =  0  m.  (b)  Error  at  p  =  0  m.  (c)  Rod  temperature  at  p  —  0.5  m. 
(d)  Error  at  p  =  0.5  m.  (e)  Rod  temperature  at  p  =  1  m.  (f)  Error  at  p  =  1  m. 
(g)  Likelihood  quotient,  (h)  Hypothesis  conditional  probability. 
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Figure  5.47  Simulation  4,  Case  1  (increasing  /?.true):  Elemental  Filter  2  (cont’d). 
(i)  Rod  temperature  at  tt  —  0  sec.  (j)  Rod  temperature  at  tt  =  0.14  sec.  (k)  Rod 
temperature  at  ti  =  0.29  sec.  (1)  Rod  temperature  at  ti  =  0.43  sec.  (m)  Rod 
temperature  at  ti  =  0.57  sec.  (n)  Rod  temperature  at  U  =  0.71  sec.  (o)  Rod 
temperature  at  ti  =  0.86  sec.  (p)  Rod  temperature  at  U  =  1.00  sec. 
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Figure  5.48  Simulation  4,  Case  1  (increasing  Rt rue):  Elemental  Filter  3.  (a)  Rod 
temperature  at  p  =  0  m.  (b)  Error  at  p  =  0  m.  (c)  Rod  temperature  at  p  —  0.5  m. 
(d)  Error  at  p  =  0.5  m.  (e)  Rod  temperature  at  p  =  1  m.  (f)  Error  at  p  =  1  m. 
(g)  Likelihood  quotient,  (h)  Hypothesis  conditional  probability. 


5-114 


26 


24 

22 

20 

18 

26 

24 

22 

20 

18 

26 

24 

22 

20 

18 

26 

24 

22 

20  : 
18 


RMS  error  =  5  deg  C 


RMS  error  =  0.12  deg  C 


RMS  error  =  0.4  deg  C 


RMS  error  =  0.35  deg  C 


0.2  0.4  0.6  0.8 

position  (m) 

(o) 


0 

0.2 

0.4  0.6  0.8 

position  (m) 

(i) 

1 

0 

0.2 

0.4  0.6  0.8 

position  (m) 

(k) 

1 

' 

0 

0.2 

0.4  0.6  0.8 

1 

position  (m) 

(m) 

- 

0 

0.2 

0.4  0.6 

position  (m) 

(j) 

0.8 

0 

0.2 

0.4  0.6 

position  (m) 

(1) 

0.8 

18 


0 

0.2 

0.4  0.6 

position  (m) 

(n) 

0.8 

0.4  0.6 

position  (m) 
(P) 


Figure  5.48  Simulation  4,  Case  1  (increasing  /?.true):  Elemental  Filter  3  (cont’d). 
(i)  Rod  temperature  at  tt  —  0  sec.  (j)  Rod  temperature  at  tt  =  0.14  sec.  (k)  Rod 
temperature  at  ti  =  0.29  sec.  (1)  Rod  temperature  at  ti  =  0.43  sec.  (m)  Rod 
temperature  at  ti  =  0.57  sec.  (n)  Rod  temperature  at  U  =  0.71  sec.  (o)  Rod 
temperature  at  ti  =  0.86  sec.  (p)  Rod  temperature  at  U  =  1.00  sec. 
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Figure  5.49  Simulation  4,  Case  1  (increasing  Rt me):  Elemental  Filter  4.  (a)  Rod 
temperature  at  p  =  0  m.  (b)  Error  at  p  =  0  m.  (c)  Rod  temperature  at  p  =  0.5  m. 
(d)  Error  at  p  =  0.5  m.  (e)  Rod  temperature  at  p  =  1  m.  (f)  Error  at  p  =  1  m. 
(g)  Likelihood  quotient,  (h)  Hypothesis  conditional  probability. 
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Figure  5.49  Simulation  4,  Case  1  (increasing  /?.true):  Elemental  Filter  4  (cont’d). 
(i)  Rod  temperature  at  tt  —  0  sec.  (j)  Rod  temperature  at  tt  =  0.14  sec.  (k)  Rod 
temperature  at  ti  =  0.29  sec.  (1)  Rod  temperature  at  ti  =  0.43  sec.  (m)  Rod 
temperature  at  ti  =  0.57  sec.  (n)  Rod  temperature  at  U  =  0.71  sec.  (o)  Rod 
temperature  at  ti  =  0.86  sec.  (p)  Rod  temperature  at  U  =  1.00  sec. 
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Figure  5.50  Simulation  4,  Case  1  (increasing  Rt me):  Elemental  Filter  5.  (a)  Rod 
temperature  at  p  =  0  m.  (b)  Error  at  p  =  0  m.  (c)  Rod  temperature  at  p  —  0.5  m. 
(d)  Error  at  p  =  0.5  m.  (e)  Rod  temperature  at  p  =  1  m.  (f)  Error  at  p  =  1  m. 
(g)  Likelihood  quotient,  (h)  Hypothesis  conditional  probability. 
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Figure  5.50  Simulation  4,  Case  1  (increasing  i?true):  Elemental  Filter  5  (cont’d). 
(i)  Rod  temperature  at  tt  —  0  sec.  (j)  Rod  temperature  at  tt  =  0.14  sec.  (k)  Rod 
temperature  at  tt  =  0.29  sec.  (1)  Rod  temperature  at  ti  =  0.43  sec.  (m)  Rod 
temperature  at  ti  =  0.57  sec.  (n)  Rod  temperature  at  U  =  0.71  sec.  (o)  Rod 
temperature  at  ti  =  0.86  sec.  (p)  Rod  temperature  at  U  =  1.00  sec. 
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(g) 

Figure  5.51  Simulation  4,  Case  1  (increasing  i?true):  Blended  Filter,  (a)  Rod 
temperature  at  p  =  0  m.  (b)  Error  at  p  =  0  m.  (c)  Rod  temperature  at  p  —  0.5  m. 
(d)  Error  at  p  =  0.5  m.  (e)  Rod  temperature  at  p  =  1  m.  (f)  Error  at  p  =  1  m. 
(g)  Rod  RMS  temperature  error. 
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Figure  5.51  Simulation  4,  Case  1  (increasing  i?true):  Blended  Filter  (cont’d). 
(h)  Rod  temperature  at  tj  =  0  sec.  (i)  Rod  temperature  at  tt  =  0.14  sec.  (j)  Rod 
temperature  at  U  =  0.29  sec.  (k)  Rod  temperature  at  ti  =  0.43  sec.  (1)  Rod  temper¬ 
ature  at  ti  =  0.57  sec.  (m)  Rod  temperature  at  tt  =  0.71  sec.  (n)  Rod  temperature 
at  U  =  0.86  sec.  (o)  Rod  temperature  at  tt  =  1.00  sec. 
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Figure  5.52  Simulation  4,  Case  2  (decreasing  i?trUe):  Hypothesis  conditional  prob¬ 
ability. 

5.5.2  Simulation  4,  Case  2  (Decreasing  Rt rue).  In  this  linearly  decreasing 
case,  the  true  measurement-corruption  noise  covariance  eigenvalues  were  varied  for 
ti  =  {0,  0.01, 0.02, . . . ,  1}  according  to 


RtrUU)  =  100(1.01  -  U)  (5.13) 

Thus  R-true  begins  at  101  and  decreases  to  1.  Figure  5.52  shows  the  probability  flow 
between  the  elemental  filters  as  -Rtrue  changes. 

The  second  column  in  Table  5.10  gives  the  range  of  Rt rue  values  for  the  best 
filter.  Note  that  the  ranges  are  mutually  exclusive.  In  the  fourth  column,  note  that 
only  two  filters  are  “in  force”  at  any  one  time  with  10%  or  more  likelihood.  As 
we  mentioned  in  Case  1,  we  could  also  note  that,  by  looking  at  the  (h)  plots  for 
the  elemental  filters,  we  could  assess  the  times  when  the  filter  is  almost  perfect  as 
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Filter 

R 

-Rtrue  when  pk>  0.5 

-Rtrue  when  pk>  0.1 

1 

0.4 

— 

— 

2 

2 

1 

1,2,3 

3 

10 

2,. ..,17 

1,  -  -  -  ,23 

4 

50 

18,...,  101 

10,...,  101 

5 

250 

— 

85, ... ,  101 

Table  5.10  Simulation  4,  Case  2  (decreasing  i?tme):  Best  hypothesis.  Filters  1  and 
5  were  never  the  best  filters  during  this  simulation;  hence  the  bars. 

indicated  by  the  lack  of  variation  in  the  hypothesis  conditional  probability  curves. 
For  example,  elemental  filter  4  shows  very  little  variation  from  time  0.45  seconds  until 
0.65  seconds  when  Rt rue  decreased  from  about  55  to  35.  This  is  slightly  different 
from  the  range  encountered  when  i?true  is  increasing:  that  range  is  roughly  40  to 
65.  Thus  for  Case  2,  we  can  see  that  the  MMAE  does  not  want  to  give  up  on  the 
elemental  filter  that  has  overestimated  the  severity  of  the  noise  (elemental  filter  5)  so 
readily  for  the  filter  that  is  more  conservatively  modeled  (elemental  filter  4)  while  the 
true  measurement-corruption  noise  covariance  is  decreasing.  This  phenomenon  can 
be  readily  explained  by  using  the  likelihood  quotient,  Lk{ti)  =  rj(ti)  A^"1(fi)  rk(ti), 
values  given  for  the  two  elemental  filters  of  concern.  In  this  decreasing  Rtrue  case,  the 
MMAE  seems  to  “hold”  onto  an  elemental  filter  based  on  a  too  high  assumption  for 
the  measurement-corruption  noise  covariance  R.  See  Figure  5.53  for  the  likelihood 
quotient  histories  for  all  five  elemental  filters;  note  that  they  are  all  linear  (with 
a  negative  slope)  in  response  to  the  decreasing  linear  change  in  the  measurement- 
corruption  noise  covariance  —  the  filter  in  force  is  the  filter  with  an  Lk  closest  in 
value  to  M  =  5.  This  tendency  to  hold  on  to  an  elemental  filter  based  on  a  too 
high  assumption  for  R  can  be  explained  by  noting  that  the  slope  of  the  likelihood 
quotient  history  for  the  fourth  elemental  filter  is  greater  than  the  slope  for  the  fifth 
elemental  filter;  thus,  while  elemental  filter  5  simply  attempts  to  maintain  its  share 
of  the  probability  by  slowly  diverging  from  a  likelihood,  L5,  near  five,  elemental  filter 
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Figure  5.53  Simulation  4,  Case  1  (decreasing  Rt rue):  Likelihood  quotient.  This 
plot  has  been  clipped  at  L  =  20  since  only  the  elemental  Liters  operating  near  L  =  5 
absorb  an  appreciable  amount  of  probability. 


4  works  harder  to  gather  all  of  the  probability  flow  unto  itself  by  rapidly  approaching 
L4  =  5. 

Overall,  the  MMAE  state  estimate  was  better  for  the  increasing  measurement- 
corruption  noise  covariance  case,  although  the  final  estimate  at  1.0  seconds  was  better 
for  this  decreasing  covariance  case  because  we  had  nearly  perfect  measurements 
during  the  last  0.3  seconds  of  this  simulation  as  seen  in  Figure  5.59  in  plots  (nr) 
through  (o).  As  stated  earlier,  we  could  have  anticipated  this  overall  result  by 
noting  that,  as  i?true  decreases,  the  elemental  Liter  in  force  with  the  bulk  of  the 
probability  is  still  correct  to  a  large  degree,  but  gradually,  the  true  error  covariance 
of  the  measurements  tightens  up  and  an  elemental  Liter  with  a  smaller  covariance 
is  slowly  promoted.  By  not  adapting  more  quickly,  the  state  estimate  suffers  some. 
However,  while  the  initial  state  estimate  bias  was  partially  masked  by  the  high  i?tme, 
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the  steadily  improving  measurements  resulted  in  a  steady  convergence  to  the  true 
temperature  profile  along  the  rod  as  seen  in  Figure  5.59  (g)  for  all  time  and  then  in 
plots  (h)  through  (o)  (on  the  following  page)  along  the  length  of  the  rod. 
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Figure  5.54  Simulation  4,  Case  2  (decreasing  Rt rUe):  Elemental  Filter  1.  (a)  Rod 
temperature  at  p  =  0  m.  (b)  Error  at  p  =  0  m.  (c)  Rod  temperature  at  p  —  0.5  m. 
(d)  Error  at  p  —  0.5  m.  (e)  Rod  temperature  at  p  =  1  m.  (f)  Error  at  p  =  1  m. 
(g)  Likelihood  quotient,  (h)  Hypothesis  conditional  probability. 
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Figure  5.54  Simulation  4,  Case  2  (decreasing  Rtme):  Elemental  Filter  1  (cont’d). 
(i)  Rod  temperature  at  tt  =  0  sec.  (j)  Rod  temperature  at  tt  =  0.14  sec.  (k)  Rod 
temperature  at  tt  =  0.29  sec.  (1)  Rod  temperature  at  ti  =  0.43  sec.  (m)  Rod 
temperature  at  ti  =  0.57  sec.  (n)  Rod  temperature  at  U  =  0.71  sec.  (o)  Rod 
temperature  at  ti  =  0.86  sec.  (p)  Rod  temperature  at  U  =  1.00  sec. 
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Figure  5.55  Simulation  4,  Case  2  (decreasing  Rtme):  Elemental  Filter  2.  (a)  Rod 
temperature  at  p  =  0  m.  (b)  Error  at  p  =  0  m.  (c)  Rod  temperature  at  p  —  0.5  m. 
(d)  Error  at  p  =  0.5  m.  (e)  Rod  temperature  at  p  =  1  m.  (f)  Error  at  p  =  1  m. 
(g)  Likelihood  quotient,  (h)  Hypothesis  conditional  probability. 
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Figure  5.55  Simulation  4,  Case  2  (decreasing  Rtme):  Elemental  Filter  2  (cont’d). 
(i)  Rod  temperature  at  tt  =  0  sec.  (j)  Rod  temperature  at  tt  =  0.14  sec.  (k)  Rod 
temperature  at  tt  =  0.29  sec.  (1)  Rod  temperature  at  ti  =  0.43  sec.  (m)  Rod 
temperature  at  ti  =  0.57  sec.  (n)  Rod  temperature  at  U  =  0.71  sec.  (o)  Rod 
temperature  at  ti  =  0.86  sec.  (p)  Rod  temperature  at  U  =  1.00  sec. 
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Figure  5.56  Simulation  4,  Case  2  (decreasing  Rt me):  Elemental  Filter  3.  (a)  Rod 
temperature  at  p  =  0  m.  (b)  Error  at  p  =  0  m.  (c)  Rod  temperature  at  p  —  0.5  m. 
(d)  Error  at  p  —  0.5  m.  (e)  Rod  temperature  at  p  =  1  m.  (f)  Error  at  p  =  1  m. 
(g)  Likelihood  quotient,  (h)  Hypothesis  conditional  probability. 


5-130 


26 


24 

22 

20 

18 

C 

26 

24 

22 

20 

18 

26 

24 

22 

20 

18 

26 

24 

22 
20  i 
18 


26 

RMS  error  =  5  deg  C 

_  24 
o 

O) 

CD 

H.  22 

Q. 

E 

RMS  error  =  0.73  deg  C 

— 

20 

18 

0.2  0.4  0.6  0.8 

position  (m) 

(i) 


RMS  error  =  0.19  deg  C 


0 

0.2 

0.4  0.6  0.8 

position  (m) 

(k) 

1 

RMS  error  =  0.17  deg  C 


0  0.2 

0.4  0.6  0.8 

position  (m) 

(m) 

1 

RMS  error  =  0.41  deg  C 


0.2  0.4  0.6  0.8 

position  (m) 

(o) 


0.2  0.4  0.6  0.8 

position  (m) 


0 

0.2 

0.4  0.6 

position  (m) 

(1) 

0.8 

0 

0.2 

0.4  0.6 

position  (m) 

(n) 

0.8 

0.4  0.6 

position  (m) 
(P) 


Figure  5.56  Simulation  4,  Case  2  (decreasing  Rtme):  Elemental  Filter  3  (cont’d). 
(i)  Rod  temperature  at  tt  =  0  sec.  (j)  Rod  temperature  at  tt  =  0.14  sec.  (k)  Rod 
temperature  at  tt  =  0.29  sec.  (1)  Rod  temperature  at  ti  =  0.43  sec.  (m)  Rod 
temperature  at  ti  =  0.57  sec.  (n)  Rod  temperature  at  U  =  0.71  sec.  (o)  Rod 
temperature  at  ti  =  0.86  sec.  (p)  Rod  temperature  at  U  =  1.00  sec. 
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Figure  5.57  Simulation  4,  Case  2  (decreasing  Rtme):  Elemental  Filter  4.  (a)  Rod 
temperature  at  p  =  0  m.  (b)  Error  at  p  =  0  m.  (c)  Rod  temperature  at  p  —  0.5  m. 
(d)  Error  at  p  =  0.5  m.  (e)  Rod  temperature  at  p  =  1  m.  (f)  Error  at  p  =  1  m. 
(g)  Likelihood  quotient,  (h)  Hypothesis  conditional  probability. 
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Figure  5.57  Simulation  4,  Case  2  (decreasing  Rtme):  Elemental  Filter  4  (cont’d). 
(i)  Rod  temperature  at  tt  =  0  sec.  (j)  Rod  temperature  at  tt  =  0.14  sec.  (k)  Rod 
temperature  at  tt  =  0.29  sec.  (1)  Rod  temperature  at  ti  =  0.43  sec.  (m)  Rod 
temperature  at  ti  =  0.57  sec.  (n)  Rod  temperature  at  U  =  0.71  sec.  (o)  Rod 
temperature  at  ti  =  0.86  sec.  (p)  Rod  temperature  at  U  =  1.00  sec. 
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Figure  5.58  Simulation  4,  Case  2  (decreasing  Rtme):  Elemental  Filter  5.  (a)  Rod 
temperature  at  p  =  0  m.  (b)  Error  at  p  =  0  m.  (c)  Rod  temperature  at  p  —  0.5  m. 
(d)  Error  at  p  =  0.5  m.  (e)  Rod  temperature  at  p  =  1  m.  (f)  Error  at  p  =  1  m. 
(g)  Likelihood  quotient,  (h)  Hypothesis  conditional  probability. 
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Figure  5.58  Simulation  4,  Case  2  (decreasing  Rtme):  Elemental  Filter  5  (cont’d). 
(i)  Rod  temperature  at  tt  =  0  sec.  (j)  Rod  temperature  at  tt  =  0.14  sec.  (k)  Rod 
temperature  at  tt  =  0.29  sec.  (1)  Rod  temperature  at  ti  =  0.43  sec.  (m)  Rod 
temperature  at  ti  =  0.57  sec.  (n)  Rod  temperature  at  U  =  0.71  sec.  (o)  Rod 
temperature  at  ti  =  0.86  sec.  (p)  Rod  temperature  at  U  =  1.00  sec. 
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Figure  5.59  Simulation  4,  Case  2  (decreasing  Rt rue):  Blended  Filter,  (a)  Rod 
temperature  at  p  =  0  m.  (b)  Error  at  p  =  0  m.  (c)  Rod  temperature  at  p  =  0.5  m. 
(d)  Error  at  p  =  0.5  m.  (e)  Rod  temperature  at  p  =  1  m.  (f)  Error  at  p  =  1  m. 
(g)  Rod  RMS  temperature  error. 
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Figure  5.59  Simulation  4,  Case  2  (decreasing  Rt rue):  Blended  Filter  (cont’d). 
(h)  Rod  temperature  at  tj  =  0  sec.  (i)  Rod  temperature  at  tt  =  0.14  sec.  (j)  Rod 
temperature  at  tt  =  0.29  sec.  (k)  Rod  temperature  at  ti  =  0.43  sec.  (1)  Rod  temper¬ 
ature  at  ti  =  0.57  sec.  (m)  Rod  temperature  at  ti  =  0.71  sec.  (n)  Rod  temperature 
at  ti  =  0.86  sec.  (o)  Rod  temperature  at  U  =  1.00  sec. 
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Filter 

Q 

R 

x0 

Po 

1 

0MRtrueo  =  0.2 

2 

0.2i?trueo  =  1 

3 

Q  true 

Rtrueo  5 

25 

25 

4 

5 -Rtrueo  25 

5 

25  Rtrueo  =  125 

Table  5.11  Simulation  5:  Elemental  Filter  Parameters.  i?trae0  represents  the  initial 
value  for  i?true. 

5. 6  Simulation  5 

In  this  simulation  we  investigate  the  MMAE’s  ability  to  respond  to  an  abrupt 
change  of  the  true  measurement-corruption  noise  covariance,  whereas  in  the  last 
simulation  the  MMAE  adjusted  to  a  smooth  linear  change.  We  shall  assume  that 
we  know  the  discrete  set  of  the  most  likely  i?true  values  that  apply  at  any  given  time 
of  interest.  This  known  set  is  used  to  build  a  bank  of  five  elemental  filters  paired 
to  the  five  possible  Atrue  values.  The  elemental  filter  design  parameters  are  shown 
in  Table  5.11.  Once  again  we  have  used  the  truth  value  for  Q.  To  be  thorough,  we 
conducted  this  simulation  with  a  dynamics  noise  strength  two,  five,  and  ten  times 
as  large,  with  no  significant  change  in  the  parameter  estimation  performance,  just 
minor  slowing  of  the  response  to  the  change,  which  is  somewhat  masked  when  Q  is 
overestimated.  Given  the  focus  of  this  simulation,  only  the  plots  for  the  elemental 
filters  for  the  truth  level  of  dynamics  noise  strength  are  shown. 

The  true  measurement-corruption  noise  covariance  was  abruptly  changed  dur¬ 
ing  the  simulation  as  follows:  in  the  first  third  of  the  scenario,  filter  3  presents 
the  best  model  as  can  be  seen  in  Figure  5.60.  In  the  second  third,  filter  4  is 
the  best  hypothesized  value,  and  finally,  filter  2  matches  the  best  during  the  final 
third.  As  expected,  only  one  elemental  filter  achieved  the  ideal  likelihood  quotient, 
Lk(ti )  =  A^1(tj)  rk(ti),  of  5  during  each  period.  For  the  first,  second,  and 

third  periods,  the  average  likelihood  quotient  was  5  for  elemental  filters  3,  4,  and  2, 
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Figure  5.60  Simulation  5:  Hypothesis  conditional  probability. 
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Figure  5.61  Simulation  5:  Abruptly  changing  measurement-corruption  noise  co- 
variance.  Legend:  O  elemental  filter;  ★  true  parameter  within  the  filter  bank.  (The 
filter  spacing  is  nonlinear  for  illustration  purposes.) 

as  shown  in  Figures  5.64(g),  5.65(g),  and  5.63(g),  respectively.  This  progression  is 
further  illustrated  in  Figure  5.61. 

The  clean,  fast  convergence  displayed  during  the  first  change  from  elemental 
filter  3  to  filter  4  can  be  explained  by  noting  that,  as  the  true  R  experienced  a  positive 
step  increase,  the  assumptions  upon  which  filter  3  was  constructed  were  now  being 
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Period:  First  Second  Third 

Truth: 

^trueo  ^  -^trueo  -^trueo  /  ^ 

Filter  1  (i?i  =  Rtrueo/25) 

125  625  25 

Filter  2  (R2  =  Rt rUeo/5) 

25  125  5 

Filter  3  (R3  =  Rt rueo) 

5  25  1 

Filter  4  (i?4  =  5RtIueo) 

1  5  1/5 

Filter  5  (R5  =  25Rtrueo ) 

1/5  1  1/25 

Table  5.12  Simulation  5:  Expected  likelihood  quotients.  _Rtraeo  is  the  initial  value 
for  Rtrue-  The  correctly  modeled  elemental  filter  will  have  a  likelihood  quotient  of 
about  five. 

violated  and  thus  the  probability  flow  to  filter  4  was  rapid.  The  second  probability 
flow  transition,  which  was  prompted  by  a  large  decrease  in  the  true  noise  covariance, 
was  less  crisp  than  the  Erst  transition.  While  the  probability  for  filter  4  dropped  off 
rapidly,  filter  3  initially  earned  a  small  share  of  the  probability  during  this  second 
transition  since  it  was  not  immediately  clear  that  the  measurement  quality  change 
was  real  versus  spurious.  These  observations  match  up  well  with  the  analysis  based 
on  the  likelihood  quotients  given  in  Simulation  4;  there  we  noted  that  the  MMAE 
responds  more  quickly  and  adroitly  to  a  positive  change  in  Rt rue  than  it  does  for  a 
negative  change  in  -Rtrue-  Thus,  we  would  expect  the  same  general  performance  to  a 
step  change  in  R. 

Note  that  the  shape  of  the  likelihood  quotient  plot  is  the  same  for  all  of  the 
elemental  filters  —  only  the  magnitude  has  changed  in  plots  (g)  of  Figures  5.62 
through  5.66.  The  expected  magnitudes  for  each  of  the  elemental  filters  in  Table 
5.12  can  be  found  by  using  the  steady  state  likelihood  quotient  for  the  kth  elemental 
filter  at  time  tt  derived  in  Section  5.1  and  repeated  here  for  our  convenience 

£{U*.)k-«„}  =  (5.10) 
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When  the  hypothesis  is  correct,  Rj~  =  Rtrue,  and  then  the  right-hand  side  of  Equation 
(5.10)  becomes  M.  The  interesting  cases  occur  when  Rj  ^  Rt rue.  For  example,  when 
Rtruc  is  “under” -modeled  by  a  factor  of  five,  then  the  likelihood  quotient  is  five  times 
bigger  than  for  a  properly  modeled  filter.  The  likelihood  quotient  of  25  occurs  during 
the  three  times  of  the  simulation  when  the  truth  R  is  fives  times  the  assumed  value 
in  the  elemental  filter.  Just  the  opposite  occurs  when  the  truth  R  is  one  fifth  of  the 
assumed  value  in  the  filter.  This  also  happens  three  times  during  the  simulation  as 
noted  in  the  table. 

While  the  likelihood  quotient  is  a  good  barometer  for  how  well  an  elemental 
filter  matches  the  real  environment,  a  quick  review  on  how  the  hypothesis  conditional 
probability  is  calculated,  see  Equations  (2.43)  through  (2.46)  on  pages  2-29  and  2-30, 
reveals  that  there  is  another  term  that  we  should  consider.  The  leading  coefficient  of 
the  Gaussian  density,  the  /3-term  shown  in  Equation  (2.45),  plays  an  important  role 
when  the  residuals  for  multiple  filters  are  approximately  equal.  Consider  the  scalar 
measurement  case:  for  approximately  equal  residuals,  the  elemental  filter  with  the 
smaller  filter-computed  residual  covariance  is  “preferred”  by  the  MMAE. 

In  the  end,  we  anticipate  that  the  innate  capability  of  the  MMAE  to  adapt  to  an 
unknown  noise  environment  will  result  in  an  improved  state  estimate.  As  we  inspect 
the  blended  filter  plots  in  Figure  5.67  on  page  5-153,  we  note  that  the  majority  of  the 
plots  appear  to  have  three  distinct  regions  —  this  is  due  to  the  abruptly  changing 
truth  R.  In  each  of  the  time  periods  in  which  a  different  i?true  was  in  force,  the  MMAE 
quickly  identified  the  best  elemental  filter;  this  leads  to  a  quality  state  estimate 
for  that  elemental  filter  for  that  time  period  and  consequently,  it  predominantly 
determined  the  blended  filter  results  given  that  it  gathered  approximately  all  of  the 
probability  during  that  particular  time  slot.  The  initial  -Rtrue  gave  reasonably  good 
measurements  and  thus  the  MMAE  was  able  to  handle  the  initial  state  estimate 
bias  with  ease.  When  the  truth  measurement-corruption  noise  covariance  increased 
by  a  factor  of  five  in  the  second  period,  the  MMAE  quickly  “decided”  that  the 
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measurement  noise  covariance  had  increased  somewhat  and  filter  4  soon  received  the 
bulk  of  the  probability.  At  the  end  of  the  first  period  (from  0  to  0.33  sec),  the  RMS 
error  in  plot  (g)  of  Figure  5.67  was  quite  small,  owing  to  the  fact  that  Rt rue  was 
small.  In  the  second  period  (from  0.33  to  0.67  sec),  Rt me  was  five  times  as  large, 
hence  it  should  not  come  as  a  surprise  that  the  mean-squared  error  would  increase  by 
a  factor  of  five  initially.  The  improved  measurement  quality  during  the  third  period 
brought  the  RMS  error  back  down  to  its  lowest  levels.  The  two  transitions  can  be 
seen  clearly  in  plots  (g)  and  (h)  of  Figures  5.63,  5.64,  and  5.65  for  elemental  filters 
2,  3,  and  4  respectively. 

For  the  two  elemental  filters  designed  to  model  poor  quality  measurements, 
elemental  filters  4  and  5,  poor  initial  transients  create  the  unusual  circumstance 
seen  in  Figures  5.65  and  5.66,  plots  (d)  and  (f).  As  noted  earlier,  the  adequacy  of 
the  initial  state  covariance  can  be  checked  by  inspecting  plots  (b),  (d),  and  (f);  the 
initial  error  should  be  within  the  1  o  (i.e.,  one  standard  deviation)  bounds  created  by 
the  initial  state  covariance.  If  this  is  not  true,  then  convergence  is  greatly  hampered 
since  the  filter  has  been  told  that  its  initial  condition  errors  are  much  smaller  than 
they  really  are. 
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Figure  5.62  Simulation  5:  Elemental  Filter  1.  (a)  Rod  temperature  at  p  =  0  m. 
(b)  Error  at  p  =  0  m.  (c)  Rod  temperature  at  p  —  0.5  m.  (d)  Error  at  p  —  0.5  m. 
(e)  Rod  temperature  at  p  —  1  m.  (f)  Error  at  p  =  1  m.  (g)  Likelihood  quotient, 
(h)  Hypothesis  conditional  probability. 
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Figure  5.62  Simulation  5:  Elemental  Filter  1  (cont’d).  (i)  Rod  temperature  at 
ti  —  0  sec.  (j)  Rod  temperature  at  ti  =  0.14  sec.  (k)  Rod  temperature  at  ti  =  0.29 
sec.  (1)  Rod  temperature  at  ti  =  0.43  sec.  (m)  Rod  temperature  at  ti  =  0.57  sec. 
(n)  Rod  temperature  at  ti  =  0.71  sec.  (o)  Rod  temperature  at  ti  =  0.86  sec.  (p)  Rod 
temperature  at  ti  =  1.00  sec. 
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Figure  5.63  Simulation  5:  Elemental  Filter  2.  (a)  Rod  temperature  at  p  =  0  m. 
(b)  Error  at  p  =  0  m.  (c)  Rod  temperature  at  p  —  0.5  m.  (d)  Error  at  p  =  0.5  m. 
(e)  Rod  temperature  at  p  =  1  m.  (f)  Error  at  p  =  1  nr.  (g)  Likelihood  quotient, 
(h)  Hypothesis  conditional  probability. 
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Figure  5.63  Simulation  5:  Elemental  Filter  2  (cont’d).  (i)  Rod  temperature  at 
U  =  0  sec.  (j)  Rod  temperature  at  tt  =  0.14  sec.  (k)  Rod  temperature  at  ti  =  0.29 
sec.  (1)  Rod  temperature  at  ti  =  0.43  sec.  (m)  Rod  temperature  at  ti  =  0.57  sec. 
(n)  Rod  temperature  at  U  =  0.71  sec.  (o)  Rod  temperature  at  ti  =  0.86  sec.  (p)  Rod 
temperature  at  ti  =  1.00  sec. 
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Figure  5.64  Simulation  5:  Elemental  Filter  3.  (a)  Rod  temperature  at  p  =  0  m. 
(b)  Error  at  p  =  0  m.  (c)  Rod  temperature  at  p  —  0.5  m.  (d)  Error  at  p  =  0.5  m. 
(e)  Rod  temperature  at  p  =  1  m.  (f)  Error  at  p  =  1  nr.  (g)  Likelihood  quotient, 
(h)  Hypothesis  conditional  probability. 
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Figure  5.64  Simulation  5:  Elemental  Filter  3  (cont’d).  (i)  Rod  temperature  at 
U  =  0  sec.  (j)  Rod  temperature  at  tt  =  0.14  sec.  (k)  Rod  temperature  at  ti  =  0.29 
sec.  (1)  Rod  temperature  at  ti  =  0.43  sec.  (m)  Rod  temperature  at  ti  =  0.57  sec. 
(n)  Rod  temperature  at  U  =  0.71  sec.  (o)  Rod  temperature  at  ti  =  0.86  sec.  (p)  Rod 
temperature  at  ti  =  1.00  sec. 
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Figure  5.65  Simulation  5:  Elemental  Filter  4.  (a)  Rod  temperature  at  p  =  0  m. 
(b)  Error  at  p  =  0  m.  (c)  Rod  temperature  at  p  =  0.5  m.  (d)  Error  at  p  =  0.5  m. 
(e)  Rod  temperature  at  p  =  1  m.  (f)  Error  at  p  =  1  nr.  (g)  Likelihood  quotient, 
(h)  Hypothesis  conditional  probability. 


5-149 


26 


24 

22 

20 

18 

I 

26 

24 

22 

20 

18 

26 

24 

22 

20 

18 

I 

26 

24 

22 

20 

18 


RMS  error  =  5  deg  C 


RMS  error  =  0.57  deg  C 


RMS  error  =  0.36  deg  C 


0 

0.2 

0.4  0.6 

position  (m) 

(i) 

0.8 

1 

0  0.2 

0.4  0.6 

position  (m) 

(k) 

0.8 

1 

0 

0.2 

0.4  0.6 

0.8 

1 

position  (m) 

(m) 

0.2  0.4  0.6  0.8 

position  (m) 


0 

0.2 

0.4  0.6 

position  (m) 

(1) 

0.8 

0 

0.2 

0.4  0.6 

position  (m) 

(n) 

0.8 

26 

RMS  error  =  0.26  deg  C 

24 

o 

O) 

CD 

S-  22 

Q. 

E 

CD 

RMS  error  =  0.1  deg  C 

20 

18 

0.2  0.4  0.6  0.8 

position  (m) 

(o) 


0.2 


0.4  0.6 

position  (m) 
(P) 


0.8 


Figure  5.65  Simulation  5:  Elemental  Filter  4  (cont’d).  (i)  Rod  temperature  at 
U  =  0  sec.  (j)  Rod  temperature  at  tt  =  0.14  sec.  (k)  Rod  temperature  at  ti  =  0.29 
sec.  (1)  Rod  temperature  at  ti  =  0.43  sec.  (m)  Rod  temperature  at  ti  =  0.57  sec. 
(n)  Rod  temperature  at  U  =  0.71  sec.  (o)  Rod  temperature  at  ti  =  0.86  sec.  (p)  Rod 
temperature  at  ti  =  1.00  sec. 
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Figure  5.66  Simulation  5:  Elemental  Filter  5.  (a)  Rod  temperature  at  p  =  0  m. 
(b)  Error  at  p  =  0  m.  (c)  Rod  temperature  at  p  —  0.5  m.  (d)  Error  at  p  =  0.5  m. 
(e)  Rod  temperature  at  p  =  1  m.  (f)  Error  at  p  =  1  nr.  (g)  Likelihood  quotient, 
(h)  Hypothesis  conditional  probability. 
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Figure  5.66  Simulation  5:  Elemental  Filter  5  (cont’d).  (i)  Rod  temperature  at 
U  =  0  sec.  (j)  Rod  temperature  at  tt  =  0.14  sec.  (k)  Rod  temperature  at  ti  =  0.29 
sec.  (1)  Rod  temperature  at  ti  =  0.43  sec.  (m)  Rod  temperature  at  ti  =  0.57  sec. 
(n)  Rod  temperature  at  U  =  0.71  sec.  (o)  Rod  temperature  at  ti  =  0.86  sec.  (p)  Rod 
temperature  at  ti  =  1.00  sec. 
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Figure  5.67  Simulation  5:  Blended  Filter,  (a)  Rod  temperature  at  p  =  0  m. 
(b)  Error  at  p  =  0  m.  (c)  Rod  temperature  at  p  =  0.5  m.  (d)  Error  at  p  =  0.5  m. 
(e)  Rod  temperature  at  p  =  1  m.  (f)  Error  at  p  =  1  m.  (g)  Rod  RMS  temperature 
error. 
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Figure  5.67  Simulation  5:  Blended  Filter  (cont’d).  (h)  Rod  temperature  at  ti  —  0 
sec.  (i)  Rod  temperature  at  U  =  0.14  sec.  (j)  Rod  temperature  at  ti  =  0.29  sec. 
(k)  Rod  temperature  at  U  =  0.43  sec.  (1)  Rod  temperature  at  U  =  0.57  sec.  (nr)  Rod 
temperature  at  ti  =  0.71  sec.  (n)  Rod  temperature  at  ti  =  0.86  sec.  (o)  Rod 
temperature  at  ti  =  1.00  sec. 
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Q  R  x0  k 

5  5  20  0.86  (aluminum) 


Table  5.13  Simulation  6:  Truth  Parameters 


Filter  Q 

R  X0  P0  K 

1 

0.011  (granite) 

2 

0.12  (cast  iron) 

3  Qtrue 

Rtme  25  25  0.86  (aluminum) 

4 

1.14  (copper) 

5 

1.71  (silver) 

Table  5.14  Simulation  6:  Elemental  Filter  Parameters.  The  diffusivity  constants 
are  taken  from  Boyce  and  DiPrima  [26]. 

5.1  Simulation  6 

In  this  final  simulation  we  examine  the  system  identification  capabilities  of 
the  MMAE.  Some  of  the  important  truth  system  parameters  are  given  in  Table 
5.13;  take  note  that  the  thermal  diffusivity  constant  is  no  longer  1,  but  has  been 
programmed  to  represent  an  aluminum  rod.  We  created  five  elemental  filters  as¬ 
suming  that  we  know  the  true  noise  environment  statistics,  i.e.,  zero-mean  Gaussian 
stochastic  process  with  known  covariance  matrices  and  a  5  °C  bias  in  our  initial 
state  (temperature)  estimate.  Each  of  the  five  elemental  filters  is  programmed  with 
a  distinct  choice  for  the  thermal  diffusivity  constant,  k,  corresponding  to  a  known 
thermal  diffusivity  for  a  specific  material  as  shown  in  Table  5.14.  Physically,  as  dif- 
fnsivity  increases,  resistance  to  heat  flow  decreases.  A  lower  diffusivity  means  that 
heat  flows  (or  diffuses)  more  slowly  through  the  material.  In  other  words,  materials 
with  “high”  diffusivity  constants  achieve  thermal  equilibrium  faster  than  materials 
such  as  granite.  Notice  how  closely  spaced  the  diffusivity  constants  (the  parameters 
in  this  case)  are  for  aluminum,  copper,  and  silver. 

By  introducing  a  purposeful  excitation  signal  into  the  system  (in  this  case,  heat 
applied  to  the  left  end  of  the  rod),  we  increased  the  identifiability  of  the  unknown 
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parameter  dramatically  and  we  were  able  to  perform  system  identification  at  a  very 
fine  level.  Since  we  were  introducing  this  excitation  to  the  system,  we  also  gave  the 
elemental  filters  a  very  good  (but  not  exact)  approximation  of  the  heating  profile. 
The  true  heating  profile  at  the  left  end  of  the  rod  is9 


( 


O.Ol+ti 

0.1 


V  v, 


^tru  e(^z)  \ 


Vv, 


0, 


0  <  ti  <  0.09 
0.1  <U<  0.75 
0.76  <  ti  <  1 


(5.14) 


where  r)  is  the  percentage  (expressed  as  a  fraction)  of  the  maximum  excitation  used 
and  v  is  the  maximum  heat  added  to  the  system.  The  approximate  control  signal  fed 
to  the  elemental  filters  was  a  Gaussian  process  with  a  mean  of  utrue(ti)  and  a  standard 
deviation  of  O.lntrue  (£?:);  see  Figure  5.68  for  a  graphical  depiction  of  the  true  heating 
profile  (solid  black  line)  plus  two  realizations  of  the  approximate  profile  (given  by 
the  dashed  and  dash-dot  lines).  Additionally,  the  relatively  low  melting  point  of 
aluminum  (660  °C,  see  [1])  compared  to  the  other  materials  limits  the  amount  of 
(excitation)  heat  that  we  can  safely  apply  to  the  system  since  we  did  not  know  what 
the  material  was  beforehand. 

We  shall  report  the  findings  for  four  experiments  that  differ  only  in  the  level  of 
system  excitation  used  to  aid  the  system  identification  process.  The  first  experiment 
under  the  Simulation  6  heading  sought  to  identify  the  thermal  diffusivity  constant 
(the  unknown  system  parameter)  using  a  “finely”  discretized  bank  of  elemental  filters 
in  the  presence  of  abundant  system  excitation  (limited  so  that  we  don’t  melt  the  rod 
if  it  happens  to  be  made  of  aluminum)  to  ascertain  how  quickly  the  system  parameter 
could  be  identified.  We  excited  the  system  by  heating  the  left  end  (p  =  0)  of  the  rod 
for  the  first  0.75  seconds  of  the  one-second  duration  run.  [The  accumulative  effects 

9We  have  assumed  that  our  heating  element  is  ideal  and  thus  the  temperature  at  the  interface, 
i.e.,  at  the  left  end  of  the  rod,  is  perfectly  regulated  so  that  it  matches  the  program  given  in 
Equation  (5.14). 
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Figure  5.68  Simulation  6:  The  true  rod  heating  profile  (solid  black  line)  applied 
to  the  left  end  of  the  rod,  plus  two  realizations  (dashed  and  dash-dot  lines)  supplied 
to  the  elemental  filters  for  the  maximum  excitation  case  ( r /  =  1)  and  temperature 
v  =  1000  °C. 

of  the  heating  profile  can  be  seen  quite  clearly  in  Figure  5.72(a)  which  shows  the  true 
and  estimated  temperature  for  the  left  end  of  the  rod  for  the  best  modeled  elemental 
filter;  for  0.09  seconds  the  heat  is  steadily  increased  and  is  then  held  constant  until 
0.75  seconds.]  Note  that  heating  the  rod  is  akin  to  dithering  the  actuator  [57,  147, 
186,  73]  discussed  in  Section  2.4.11.  By  adding  a  known  persistent  excitation  into 
the  system,  the  identifiability  of  specific  parameter  of  interest  is  increased.  For  the 
maximum  level  of  persistent  system  excitation,  the  MMAE  quickly  identifies  the 
thermal  diffusivity  constant  as  seen  in  Figure  5.69(a)  [and  Figure  5.72(h)],  Figure 
5.69(a)  shows  clearly  that  elemental  filter  3  is  most  likely  correct,  hence  the  material 
in  question  is  aluminum  with  a  probability  of  nearly  one. 

In  a  second  experiment,  we  determined  that  even  one  fifth  of  the  system  excita¬ 
tion  level  used  in  the  first  experiment  could  achieve  very  good  system  identification 
results,  i.e. ,  we  could  still  tell  that  the  material  in  question  is  aluminum  with  about 
90%  certainty,  as  shown  in  Figure  5.69(b).  At  one  tenth  of  the  excitation,  the  prob- 
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Figure  5.69  Simulation  6:  Hypothesis  conditional  probability,  (a)  Maximum  exci¬ 
tation,  (b)  Moderate  excitation,  (c)  Low  excitation,  (d)  Minimum  excitation. 


ability  that  the  rod  is  aluminum  drops  to  about  65%,  with  elemental  filter  4  rising 
to  almost  one  third  probability  that  the  material  is  copper;  see  Figure  5.69(c).  At 
this  low  level  of  excitation,  the  three  elemental  Liters  with  large  diffusivity  constant 
“look”  and  perform  equally  well  in  the  beginning,  the  two  Liters  with  small  s  over¬ 
estimate  how  hot  the  rod  is  and  consequently  have  terrible  residuals,  and  eventually 
the  relatively  high  diLusivity  constant  Liters  received  all  of  the  probability.  In  the 
fourth  case  for  one  twentieth  the  original  excitation  magnitude,  Figure  5.69(d)  shows 
us  that  the  elemental  Liter  tuned  for  aluminum  was  still  the  most  likely  of  the  Li¬ 
ters  in  the  bank,  with  almost  0.5  probability,  while  the  elemental  Liters  featuring 
larger  diffusivity  constants  (copper  and  silver)  received  about  one  half  of  the  total 
probability.  The  full  workup  of  plots  are  shown  for  this  minimum  level  of  excitation 
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in  Figures  5.76  on  page  5-174  through  5.81.  The  individual  elemental  filter  plots 
associated  with  the  moderate  and  low  levels  of  excitation  are  not  included  in  this 
report  since  the  trends  are  all  the  same  (as  can  be  seen  by  comparing  corresponding 
figures  for  the  maximum  and  minimum  system  excitation  cases). 

When  a  filter  is  based  on  a  model  with  a  thermal  diffusivity  constant  that 
is  well  below  the  true  value,  the  filter  “sees”  a  large  increase  in  temperature  as  a 
tremendous  increase  in  temperature  since  a  “small”  diffusivity  constant  means  that 
the  heat  slowly  diffuses  through  the  material.  Thus  the  huge  errors  indicated  in 
Figure  5.70  are  quite  clearly  seen  in  plots  (a)  through  (f)  and  the  RMS  error  values 
given  in  plots  (i)  through  (p).  While  the  absolute  errors  are  much  smaller  when  the 
system  excitation  is  much  lower,  the  same  trends  can  be  seen  in  Figure  5.76  for  the 
elemental  filter  that  is  set  up  to  identify  granite. 

At  the  other  end  of  the  spectrum,  when  the  assumed  k  is  bigger  than  the  true 
diffusivity  constant,  the  filter  underestimates  the  true  level  of  heat  flow  because  it 
“thinks”  that  the  flow  moves  more  swiftly  than  it  truly  can  in  the  truth  system  with 
the  smaller  k.  Once  again,  the  maximum  excitation  experiment  provides  the  best 
example  of  this  in  Figure  5.74,  and  the  minimum  excitation  plots  exhibit  a  similar 
trend;  however,  the  MMAE  does  not  correct  itself  as  quickly,  as  can  be  seen  in  Figure 
5.80. 

As  expected,  all  of  the  elemental  filters  (whether  properly  modeled  or  not) 
saw  their  state  estimation  errors  decrease  after  the  excitation  was  removed.  This  is 
precisely  why  we  apply  persistent  system  excitation,  to  enable  the  MMAE  to  tell  the 
difference  between  the  elemental  filters.  Without  persistent  excitation,  the  system 
is  able  to  dampen  out  heat  flow  such  that  most  any  model  will  match  the  heat  flow 
characteristics  encoded  in  the  system  dynamics  model. 

Thus,  at  substantially  lower  levels  of  persistent  excitation,  we  must  more 
coarsely  discretize  the  set  of  values  used  to  represent  the  diffusivity  constant  so 
that  the  individual  filters  appear  more  distinct.  Preliminary  simulations  indicate 
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that  if  we  forego  persistent  system  excitation  that  (1)  the  discretization  level  needs 
to  be  such  that  two  (or  even  three)  orders  of  magnitude  separate  the  parameter  val¬ 
ues  used  to  specify  the  elemental  filters,  and  (2)  the  MMAE  takes  longer  to  make  a 
“decision1' ,  therefore,  the  simulation  time  must  be  increased  to  observe  the  behavior 
of  the  filters. 

There  are  of  course  an  endless  number  of  simulated  experiments  that  we  could 
perform  in  order  to  characterize  the  performance  of  the  MMAE  more  fully  for  this 
problem.  However,  without  actually  running  another  simulation,  we  could  hypoth¬ 
esize  the  following:  how  can  we  improve  the  simulation  results  for  this  minimal 
excitation  experiment  without  coarsening  the  discretization?  Recall  that  in  the  Erst 
simulation  we  demonstrated  how  the  Q/R  ratio  influenced  the  distinguishability  of 
the  elemental  filters.  Since  the  diffusivity  constant  is  part  of  the  dynamics  model, 
if  we  had  conducted  an  experiment  with  a  smaller  Q/R  ratio,  i.e.,  the  propagated 
estimates  are  to  be  favored  over  the  measurement-updated  estimates,  then  we  could 
expect  to  enhance  the  results  for  the  minimal  excitation  experiment  because  the 
differences  between  the  filters  would  be  accentuated. 

Throughout  this  chapter,  the  likelihood  quotient  has  proved  to  be  an  excellent 
guide  to  elemental  filter  performance  since  it  considers  both  the  residuals  and  the 
filter-computed  residual  covariance  in  concert  with  one  another  —  see,  for  example, 
plots  (g)  of  Figures  5.70  through  5.74  for  one  more  set  of  illustrations.  On  the  other 
hand,  in  this  experiment,  plots  (b),  (d),  and  (f)  of  Figures  5.70  through  5.74  also 
demonstrate  clearly  which  elemental  filter  best  represents  the  true  system.  A  quick 
review  of  these  plots  shows  that  only  the  third  elemental  filter  has  a  mean  estimation 
error  (given  by  the  solid  line)  consistently  bounded  by  the  pair  of  dashed  lines  which 
are  used  to  represent  how  well  the  filter  “thinks”  that  it  is  performing.  When  the 
solid  line  (actual  performance)  radically  differs  from  the  expected  performance  given 
by  the  dashed  lines,  then  we  can  be  fairly  certain  that  the  model  is  not  well  matched 
to  the  real  world  that  we  are  simulating.  This  same  trend  is  fairly  evident  for  the 
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minimal  excitation  case  too,  see  plots  (b),  (d),  and  (f)  of  Figures  5.76  through  5.80, 
with  the  exception  of  filter  4  which  looks  nearly  as  well  matched  as  filter  3  (to  the 
untrained  eye). 
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Figure  5.70  Simulation  6  (maximum  excitation):  Elemental  Filter  1.  (a)  Rod 

temperature  at  p  =  0  m.  (b)  Error  at  p  =  0  m.  (c)  Rod  temperature  at  p  —  0.5  m. 
(d)  Error  at  p  —  0.5  m.  (e)  Rod  temperature  at  p  =  1  m.  (f)  Error  at  p  =  1  m. 
(g)  Likelihood  quotient,  (h)  Hypothesis  conditional  probability. 
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Figure  5.70  Simulation  6  (maximum  excitation):  Elemental  Filter  1  (cont’d). 
(i)  Rod  temperature  at  £*  =  0  sec.  (j)  Rod  temperature  at  tt  =  0.14  sec.  (k)  Rod 
temperature  at  ti  =  0.29  sec.  (1)  Rod  temperature  at  U  =  0.43  sec.  (m)  Rod  temper¬ 
ature  at  U  =  0.57  sec.  (n)  Rod  temperature  at  tt  =  0.71  sec.  (o)  Rod  temperature 
at  U  =  0.86  sec.  (p)  Rod  temperature  at  tt  =  1.00  sec. 
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Figure  5.71  Simulation  6  (maximum  excitation):  Elemental  Filter  2.  (a)  Rod 

temperature  at  p  =  0  m.  (b)  Error  at  p  =  0  m.  (c)  Rod  temperature  at  p  —  0.5  m. 
(d)  Error  at  p  =  0.5  m.  (e)  Rod  temperature  at  p  =  1  m.  (f)  Error  at  p  =  1  m. 
(g)  Likelihood  quotient,  (h)  Hypothesis  conditional  probability. 
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Figure  5.71  Simulation  6  (maximum  excitation):  Elemental  Filter  2  (cont’d). 
(i)  Rod  temperature  at  £*  =  0  sec.  (j)  Rod  temperature  at  tt  =  0.14  sec.  (k)  Rod 
temperature  at  ti  =  0.29  sec.  (1)  Rod  temperature  at  U  =  0.43  sec.  (m)  Rod  temper¬ 
ature  at  U  =  0.57  sec.  (n)  Rod  temperature  at  tt  =  0.71  sec.  (o)  Rod  temperature 
at  U  =  0.86  sec.  (p)  Rod  temperature  at  tt  =  1.00  sec. 
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Figure  5.72  Simulation  6  (maximum  excitation):  Elemental  Filter  3.  (a)  Rod 

temperature  at  p  =  0  m.  (b)  Error  at  p  =  0  m.  (c)  Rod  temperature  at  p  —  0.5  m. 
(d)  Error  at  p  =  0.5  m.  (e)  Rod  temperature  at  p  =  1  m.  (f)  Error  at  p  =  1  m. 
(g)  Likelihood  quotient,  (h)  Hypothesis  conditional  probability. 
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Figure  5.72  Simulation  6  (maximum  excitation):  Elemental  Filter  3  (cont’d). 
(i)  Rod  temperature  at  £*  =  0  sec.  (j)  Rod  temperature  at  tt  =  0.14  sec.  (k)  Rod 
temperature  at  ti  =  0.29  sec.  (1)  Rod  temperature  at  U  =  0.43  sec.  (m)  Rod  temper¬ 
ature  at  U  =  0.57  sec.  (n)  Rod  temperature  at  tt  =  0.71  sec.  (o)  Rod  temperature 
at  U  =  0.86  sec.  (p)  Rod  temperature  at  tt  =  1.00  sec. 
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Figure  5.73  Simulation  6  (maximum  excitation):  Elemental  Filter  4.  (a)  Rod 

temperature  at  p  =  0  m.  (b)  Error  at  p  =  0  m.  (c)  Rod  temperature  at  p  —  0.5  m. 
(d)  Error  at  p  =  0.5  m.  (e)  Rod  temperature  at  p  =  1  m.  (f)  Error  at  p  =  1  m. 
(g)  Likelihood  quotient,  (h)  Hypothesis  conditional  probability. 
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Figure  5.73  Simulation  6  (maximum  excitation):  Elemental  Filter  4  (cont’d). 
(i)  Rod  temperature  at  £*  =  0  sec.  (j)  Rod  temperature  at  tt  =  0.14  sec.  (k)  Rod 
temperature  at  ti  =  0.29  sec.  (1)  Rod  temperature  at  U  =  0.43  sec.  (m)  Rod  temper¬ 
ature  at  U  =  0.57  sec.  (n)  Rod  temperature  at  tt  =  0.71  sec.  (o)  Rod  temperature 
at  U  =  0.86  sec.  (p)  Rod  temperature  at  tt  =  1.00  sec. 
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Figure  5.74  Simulation  6  (maximum  excitation):  Elemental  Filter  5.  (a)  Rod 

temperature  at  p  =  0  m.  (b)  Error  at  p  =  0  m.  (c)  Rod  temperature  at  p  —  0.5  m. 
(d)  Error  at  p  =  0.5  m.  (e)  Rod  temperature  at  p  =  1  m.  (f)  Error  at  p  =  1  m. 
(g)  Likelihood  quotient,  (h)  Hypothesis  conditional  probability. 
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Figure  5.74  Simulation  6  (maximum  excitation):  Elemental  Filter  5  (cont’d). 
(i)  Rod  temperature  at  £*  =  0  sec.  (j)  Rod  temperature  at  tt  =  0.14  sec.  (k)  Rod 
temperature  at  ti  =  0.29  sec.  (1)  Rod  temperature  at  U  =  0.43  sec.  (m)  Rod  temper¬ 
ature  at  U  =  0.57  sec.  (n)  Rod  temperature  at  ti  =  0.71  sec.  (o)  Rod  temperature 
at  U  =  0.86  sec.  (p)  Rod  temperature  at  ti  =  1.00  sec. 
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Figure  5.75  Simulation  6  (maximum  excitation):  Blended  Filter,  (a)  Rod  tem¬ 
perature  at  p  =  0  m.  (b)  Error  at  p  =  0  m.  (c)  Rod  temperature  at  p  =  0.5  m 
(d)  Error  at  p  —  0.5  nr.  (e)  Rod  temperature  at  p  =  1  m.  (f)  Error  at  p  =  1  m 
(g)  Rod  RMS  temperature  error. 
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Figure  5.75  Simulation  6  (maximum  excitation):  Blended  Filter  (cont’d).  (h)  Rod 
temperature  at  ti  =  0  sec.  (i)  Rod  temperature  at  ti  =  0.14  sec.  (j)  Rod  temperature 
at  ti  =  0.29  sec.  (k)  Rod  temperature  at  tt  =  0.43  sec.  (1)  Rod  temperature  at 
ti  =  0.57  sec.  (m)  Rod  temperature  at  U  =  0.71  sec.  (n)  Rod  temperature  at 
ti  =  0.86  sec.  (o)  Rod  temperature  at  U  =  1.00  sec. 
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Figure  5.76  Simulation  6  (minimum  excitation):  Elemental  Filter  1.  (a)  Rod 

temperature  at  p  =  0  m.  (b)  Error  at  p  =  0  m.  (c)  Rod  temperature  at  p  —  0.5  m. 
(d)  Error  at  p  —  0.5  m.  (e)  Rod  temperature  at  p—1  m.  (f)  Error  at  p  =  1  m. 
(g)  Likelihood  quotient,  (h)  Hypothesis  conditional  probability. 


5-174 


0.4  0.6 

position  (m) 

(i) 


Figure  5.76  Simulation  6  (minimum  excitation):  Elemental  Filter  1  (cont’d). 
(i)  Rod  temperature  at  £j  =  0  sec.  (j)  Rod  temperature  at  t%  =  0.14  sec.  (k)  Rod 
temperature  at  ti  =  0.29  sec.  (1)  Rod  temperature  at  U  =  0.43  sec.  (m)  Rod  temper¬ 
ature  at  U  =  0.57  sec.  (n)  Rod  temperature  at  tt  =  0.71  sec.  (o)  Rod  temperature 
at  U  =  0.86  sec.  (p)  Rod  temperature  at  tt  =  1.00  sec. 
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Figure  5.77  Simulation  6  (minimum  excitation):  Elemental  Filter  2.  (a)  Rod 

temperature  at  p  =  0  m.  (b)  Error  at  p  =  0  m.  (c)  Rod  temperature  at  p  =  0.5  m. 
(d)  Error  at  p  =  0.5  nr.  (e)  Rod  temperature  at  p  =  1  m.  (f)  Error  at  p  =  1  m. 
(g)  Likelihood  quotient,  (h)  Hypothesis  conditional  probability. 
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Figure  5.77  Simulation  6  (minimum  excitation):  Elemental  Filter  2  (cont’d). 
(i)  Rod  temperature  at  £j  =  0  sec.  (j)  Rod  temperature  at  t%  =  0.14  sec.  (k)  Rod 
temperature  at  ti  =  0.29  sec.  (1)  Rod  temperature  at  U  =  0.43  sec.  (m)  Rod  temper¬ 
ature  at  U  =  0.57  sec.  (n)  Rod  temperature  at  tt  =  0.71  sec.  (o)  Rod  temperature 
at  U  =  0.86  sec.  (p)  Rod  temperature  at  tt  =  1.00  sec. 
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Figure  5.78  Simulation  6  (minimum  excitation):  Elemental  Filter  3.  (a)  Rod 

temperature  at  p  =  0  m.  (b)  Error  at  p  =  0  m.  (c)  Rod  temperature  at  p  =  0.5  m. 
(d)  Error  at  p  =  0.5  nr.  (e)  Rod  temperature  at  p  =  1  m.  (f)  Error  at  p  =  1  m. 
(g)  Likelihood  quotient,  (h)  Hypothesis  conditional  probability. 
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Figure  5.78  Simulation  6  (minimum  excitation):  Elemental  Filter  3  (cont’d). 
(i)  Rod  temperature  at  £j  =  0  sec.  (j)  Rod  temperature  at  t%  =  0.14  sec.  (k)  Rod 
temperature  at  ti  =  0.29  sec.  (1)  Rod  temperature  at  U  =  0.43  sec.  (m)  Rod  temper¬ 
ature  at  U  =  0.57  sec.  (n)  Rod  temperature  at  tt  =  0.71  sec.  (o)  Rod  temperature 
at  U  =  0.86  sec.  (p)  Rod  temperature  at  tt  =  1.00  sec. 
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Figure  5.79  Simulation  6  (minimum  excitation):  Elemental  Filter  4.  (a)  Rod 

temperature  at  p  =  0  m.  (b)  Error  at  p  =  0  m.  (c)  Rod  temperature  at  p  =  0.5  m. 
(d)  Error  at  p  =  0.5  nr.  (e)  Rod  temperature  at  p  =  1  m.  (f)  Error  at  p  =  1  m. 
(g)  Likelihood  quotient,  (h)  Hypothesis  conditional  probability. 
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Figure  5.79  Simulation  6  (minimum  excitation):  Elemental  Filter  4  (cont’d). 
(i)  Rod  temperature  at  £j  =  0  sec.  (j)  Rod  temperature  at  t%  =  0.14  sec.  (k)  Rod 
temperature  at  ti  =  0.29  sec.  (1)  Rod  temperature  at  U  =  0.43  sec.  (m)  Rod  temper¬ 
ature  at  U  =  0.57  sec.  (n)  Rod  temperature  at  tt  =  0.71  sec.  (o)  Rod  temperature 
at  U  =  0.86  sec.  (p)  Rod  temperature  at  tt  =  1.00  sec. 
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Figure  5.80  Simulation  6  (minimum  excitation):  Elemental  Filter  5.  (a)  Rod 

temperature  at  p  =  0  m.  (b)  Error  at  p  =  0  m.  (c)  Rod  temperature  at  p  —  0.5  m. 
(d)  Error  at  p  =  0.5  nr.  (e)  Rod  temperature  at  p  =  1  m.  (f)  Error  at  p  =  1  m. 
(g)  Likelihood  quotient,  (h)  Hypothesis  conditional  probability. 
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Figure  5.80  Simulation  6  (minimum  excitation):  Elemental  Filter  5  (cont’d). 
(i)  Rod  temperature  at  £j  =  0  sec.  (j)  Rod  temperature  at  t%  =  0.14  sec.  (k)  Rod 
temperature  at  ti  =  0.29  sec.  (1)  Rod  temperature  at  U  =  0.43  sec.  (m)  Rod  temper¬ 
ature  at  U  =  0.57  sec.  (n)  Rod  temperature  at  tt  =  0.71  sec.  (o)  Rod  temperature 
at  U  =  0.86  sec.  (p)  Rod  temperature  at  tt  =  1.00  sec. 
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Figure  5.81  Simulation  6  (minimum  excitation):  Blended  Filter,  (a)  Rod  temper¬ 
ature  at  p  =  0  m.  (b)  Error  at  p  =  0  m.  (c)  Rod  temperature  at  p  =  0.5  m.  (d)  Error 
at  p  =  0.5  m.  (e)  Rod  temperature  at  p  =  1  m.  (f)  Error  at  p  =  1  m.  (g)  Rod  RMS 
temperature  error. 
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Figure  5.81  Simulation  6  (minimum  excitation):  Blended  Filter  (cont’d).  (h)  Rod 
temperature  at  ti  =  0  sec.  (i)  Rod  temperature  at  =  0.14  sec.  (j)  Rod  temperature 
at  ti  =  0.29  sec.  (k)  Rod  temperature  at  tt  =  0.43  sec.  (1)  Rod  temperature  at 
ti  =  0.57  sec.  (m)  Rod  temperature  at  U  =  0.71  sec.  (n)  Rod  temperature  at 
ti  =  0.86  sec.  (o)  Rod  temperature  at  U  =  1.00  sec. 
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5. 8  Summary 

In  the  first  five  Monte  Carlo  simulations,  we  concentrated  on  adapting  to  an 
uncertain  noise  environment.  Specifically,  in  the  first  simulation  we  concentrated 
on  uncertain  dynamics  noise  strength  and  showing  how  the  Q/R  ratio  and  dis¬ 
cretization  level,  d,  affected  the  distinguishability  of  the  elemental  filters  in  the  filter 
bank  as  characterized  by  the  hypothesis  conditional  probability  histories.  In  the 
second  simulation,  we  illustrated  the  importance  of  properly  initializing  the  elemen¬ 
tal  filters.  Simulations  three  through  five  featured  the  identification  and  adaptation 
of  the  MMAE  to  an  unknown  (constant,  linearly  varying,  and  abruptly  changing) 
measurement-corruption  noise  covariance.  Finally,  in  simulation  six,  we  demon¬ 
strated  perhaps  the  most  powerful  aspect  of  the  MMAE,  that  of  accurately  identi¬ 
fying  a  structural  aspect  (or  parameter)  of  the  dynamical  system,  in  this  case,  the 
thermal  diffusivity  constant. 
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VI.  Conclusions 


6. 1  Introduction 

The  goal  of  this  chapter  is  twofold:  firstly,  to  summarize  the  contributions, 
both  large  and  small,  resulting  from  the  research  documented  in  this  dissertation, 
and  secondly,  to  suggest  some  recommendations  for  future  research.  The  focal  point 
of  this  dissertation  is  the  infinite-dimensional  sampled-data  Kalman  filter  (ISKF) 
given  by  Theorem  91,  on  page  3-85.  In  general,  the  other  contributions  discussed 
below  serve  to  motivate  or  support  the  development  of  the  ISKF  and  the  generalized 
infinite- dimensional  multiple  model  adaptive  estimation  (GIMMAE)  framework,  or 
to  illustrate  their  use  in  a  practical  problem.  The  recommendations  for  future  work 
represent  just  a  few  of  the  interesting  paths  that  should  be  investigated  to  develop 
this  area  of  research  more  fully. 

As  illustrated  in  Table  3.1  on  page  3-83,  the  Kalman  filtering  quartet,  that 
began  with  the  (discrete-time  dynamics,  sampled-data  measurements)  Kalman  filter 
[95],  the  (continuous-time  dynamics,  continuous-time  measurements)  Kalman-Bucy 
filter  [96],  and  the  infinite-dimensional  (continuous-time  dynamics,  continuous-time 
measurements)  Kalman-Bucy  filter  by  Falb  [51]  is  now  complete  with  the  addition 
of  the  infinite-dimensional  (discrete-time  dynamics,  sampled-data  measurements) 
Kalman  filter  (ISKF).  More  importantly,  the  ISKF  provides  the  proper  foundation 
for  crafting  an  exact  and  easily  implemented  algorithm  for  a  digital  computer,  as 
demonstrated  in  Chapter  IV. 

6.2  Contributions 

Our  first  contribution,  which  can  be  found  in  Section  1.3,  is  a  simplified  pre¬ 
sentation  of  multiple  model  adaptive  estimation  (MMAE)  and  its  application  to 
navigation.  This  introductory  development  of  the  MMAE  is  intended  to  be  a  source 
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of  motivation  for  new  researchers  —  it  was  inspired  by  the  illustrative  “lost  at  sea” 
example  given  in  Section  1.5  of  Maybeck  [129]. 

The  focus  of  this  dissertation  is  on  the  development  of  an  abstract  mathemati¬ 
cal  algorithm  and  then  demonstrating  how  one  might  implement  this  algorithm  for  a 
practical  problem.  The  background  material  given  in  Chapter  II,  while  not  normally 
considered  a  contribution,  contains  a  large  collection  of  previous  contributions  from 
the  stochastic  estimation  literature;  thus,  it  serves  to  not  only  prepare  and  moti¬ 
vate  the  reader  for  understanding  this  research,  but  to  “prime  the  pump”  for  future 
research  directions. 

With  an  eye  towards  future  research,  the  ISKF  was  built  incrementally  upon  a 
sequence  of  stochastic  estimators  and  increasingly  specialized  measurement  models. 
While  the  derivations  in  this  research  were  certainly  inspired  by  the  published  works 
contained  in  the  literature  (to  include  the  advanced  topics  discussed  in  Section  1.4, 
the  extensive  background  leading  to  the  present  day  MMAE  in  Chapter  II,  and  linear 
estimation  theory  given  in  Section  3.3),  only  the  first  five  definitions  of  Section  3.3 
were  borrowed  (and  cited)  from  the  literature.  The  remainder  of  the  development 
contained  in  Sections  3.3  through  3.6  represents  an  original  contribution. 

The  development  of  the  ISKF  is  underpinned  by  conditional  expectation1;  thus, 
the  conditional  mean  estimator  posed  on  a  separable  Hilbert  space,  see  Theorem  65, 
represents  the  core  of  the  ISKF  —  our  development  of  a  conditional  mean  estima¬ 
tor  parallels  the  finite-dimensional  case  given  by  Scharf  [170].  Following  the  initial 
development,  we  showed  that  the  conditional  mean  estimator  solves  the  minimum 
mean-squared  error  estimation  problem  posed  in  Definition  68.  After  defining  a 
measurement  model  for  correlated  states  and  observations  (CSO)  in  Definition  70, 
a  linear  infinite-dimensional  minimum  variance  unbiased  estimator  (LIMVUE)  for 
CSO  is  given  in  Theorem  71;  this  theorem  takes  the  first  big  step  towards  the  ISKF. 
Next,  a  generalized  linear  measurement  model  is  proposed  in  Definition  72  and  then 

^ee  Definition  55  and  the  references  therein  for  a  general  treatment  of  conditional  expectation. 


6-2 


another  LIMVUE  is  given  based  on  this  measurement  model  in  Theorem  75.  Now 
that  the  basic  estimation  theory  for  a  state  stochastic  process  is  in  place,  we  pose 
a  generalized  linear  stochastic  measurement  model  in  Definition  76  and  develop  the 
corresponding  LIMVUE  for  stochastic  processes  in  Theorem  78.  Finally,  the  ISKF 
is  given  by  Theorem  91  and  the  GIMMAE  framework  is  developed  in  Section  3.6. 

While  many  (if  not  most)  of  the  physical  problems  motivating  this  line  of 
research  mathematically  model  the  system  dynamics  using  an  infinite-dimensional 
continuous-time  description,  such  as  a  stochastic  partial  differential  equation,  our 
goal  is  to  craft  a  digital  computer  algorithm,  hence  there  is  an  intermediate  need  to 
transform  the  infinite-dimensional  continuous-time  model  into  an  equivalent  infinite¬ 
dimensional  discrete-time  model,  e.g.,  an  infinite-dimensional  difference  equation. 
Thus,  the  next  step  entails  mapping  of  the  equivalent  infinite- dimensional  discrete¬ 
time  model  to  an  essentially-equivalent  infinite-dimensional  discrete-time  model;  this 
is  performed  in  Chapter  IV.  The  work  in  Section  3.4  parallels  that  of  Maybeck  [129] 
for  finite-dimensional  systems,  and  hence  the  idea  is  not  original,  but  the  transfor¬ 
mation  process  for  infinite-dimensional  systems  is  a  contribution  resulting  from  this 
research.  That  is,  the  linkage  between  the  continuous-time  dynamics  model  given 
in  Definition  79  and  the  discrete-time  dynamics  model  of  Definition  80  is  new  even 
though  it  “looks”  nearly  identical  to  the  finite-dimensional  work  as  exemplified  in 
Maybeck  [129]  —  see  Theorems  86  and  88  in  particular. 

The  purpose  of  the  extended  example  problem  given  in  Chapter  IV  is  to  de¬ 
velop  a  method  for  using  the  ISKF  in  an  MMAE  to  estimate  the  temperature  profile 
along  a  slender  cylindrical  rod;  the  resulting  structure  realized  by  the  MATLAB 
programming  environment  is  the  approximate  infinite-dimensional  MMAE  (AIM- 
MAE).  By  approximating  the  temperature  (state)  function,  we  are  able  to  use  the 
infinite-dimensional  structural  and  statistical  components  of  the  ISKF  without  fur¬ 
ther  approximations.  Using  the  results  of  Section  3.4,  we  transformed  the  infinite¬ 
dimensional  continuous-time  system  model  into  an  equivalent  infinite-dimensional 
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discrete-time  model.  Next,  an  essentially-equi valent  finite-dimensional  discrete-time 
model  was  derived  by  approximating  the  state  temperature  function  using  a  finite 
number  of  terms  from  a  Fourier  series  expansion  and  then  determining  the  resulting 
form  of  the  ISKF  components.  This  approximation  technique  gives  rise  to  a  sampled- 
data  Kalman  filter  used  to  generate  an  optimal  estimate  of  a  predetermined  finite 
number  of  the  coefficients  associated  with  the  Fourier  series  expansion  of  the  true 
state  temperature  function. 

In  Chapter  V  we  found  that  the  AIMMAE  was  quite  capable  of  estimating  the 
state  in  a  variety  of  uncertain  noise  environments  as  well  as  performing  the  task  of 
system  identification.  It  was  not  an  original  goal  of  this  work  to  perform  the  system 
identification  task,  however,  it  accomplishes  the  task  quite  well. 

6.3  Recommendations 

There  are  several  main  threads  of  inquiry  that  warrant  considerable  more  at¬ 
tention  than  given  in  this  dissertation.  Characterizing  the  ISKF,  expanding  the  class 
of  problems  that  the  ISKF  can  be  filter,  improving  the  state  function  approximation, 
and  the  employment  of  moving-bank  MMAE  structures  represent  interesting  areas 
for  further  research. 

While  the  ISKF  was  fully  developed,  we  have  made  no  attempt  to  character¬ 
ize  the  controllability  (the  property  of  being  able  to  steer  the  system  between  two 
arbitrary  points  in  the  state  space)  or  observability  (the  property  of  being  able  to 
determine  the  initial  state  uniquely  using  only  the  knowledge  of  the  output)  of  the 
infinite-dimensional  dynamics  and  measurement  models  upon  which  estimator  was 
based  or  the  stability  of  the  ISKF  itself.  See  texts  by  Curtain  on  infinite-dimensional 
system  theory  for  a  good  discussion  regarding  these  topics  [38,  39]. 

Another  area  for  exploration  would  be  to  expand  the  class  of  infinite¬ 
dimensional  problems  to  include  two-parameter  semigroups  of  bounded  linear  op¬ 
erators;  see,  for  example,  Pazy  [160]. 
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In  transforming  the  infinite-dimensional  continuous-time  system  model  into  an 
equivalent  infinite-dimensional  discrete-time  model,  we  experienced  no  loss  of  infor¬ 
mation.  In  contrast,  the  quality  of  essentially-equivalent  finite-dimensional  discrete¬ 
time  model  is  wholly  dependent  on  the  method  and  subspace  used  to  model  the  state 
function,  since  the  structural  and  statistical  components  of  the  ISKF  were  developed 
in  concert  with  the  approximation  chosen  for  the  state  function.  Thus,  the  perfor¬ 
mance  of  the  ISKF  could  be  enhanced  by  optimizing  the  manner  in  which  the  state 
function  is  approximated. 

In  Chapter  II  we  introduced  several  moving-bank  structures.  Depending  on 
the  application,  one  of  these  might  improve  state  and/or  parameter  estimation  per¬ 
formance  relative  to  the  fixed-bank  MMAE  used  in  this  research. 
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time-delayed  differential  equations  along  with  a  possibly  infinite-dimensional  measurement  model.  The  Kalman  filtering  technique 
was  extended  to  encompass  infinite-dimensional  continuous -time  systems  with  sampled-data  measurements  and  a  technique  to 
approximate  an  infinite-dimensional  continuous-time  system  model  with  an  essentially  equivalent  finite -dimensional  discrete-time 
model  upon  which  a  filtering  algorithm  could  be  based  was  developed.  The  tools  developed  during  this  research  were  demonstrated 
using  an  estimation  problem  based  on  a  stochastic  partial  differential  equation  with  an  unknown  noise  environment. 
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